Jun 2, 2023
Face privacy enhancing technology (PET) is our most unique tech. Nobody has matured this tech enough to be used in a real-world use-case. Regulo is the first company to build a production version of a face-based PET. However, before I go into the technology, I want to highlight how it came about.
My primary vision was to replace name-based systems with face-based systems. Name-based systems generate a lot of false positives, and a face-based system can address this problem. While working on this solution, I was constantly reminded of the problem I had faced in a previous project. We were going through the onboarding journey of a large bank. The business team of the bank did a POC and confirmed that they wanted to onboard us. The procurement team started the process to onboard us. It took them about 18 months to actually onboard us. The reason for this was that they were sharing personally identifiable information (PII), in this case, the customer's mobile numbers. Imagine asking a bank for its customers facial information. It would take the bank a minimum of 18 months to fulfill this request. This is something I was not prepared for and it made me wonder if there was a solution to this problem.
I spent a few months researching the existing tech and talking to a few face recognition experts. While some basic work had been done by a few experts, none of the technologies were mature enough to be used for the use-case that I was envisioning. This is where I reached out to my friend, Ravi Teja (we are perpetually confused about what to call each other!). He has done some really kick-ass work on a range of AI algorithms. We have previously collaborated on a hair transplant AI use-case. After reading a lot of research papers, we set out to build the family of algorithms needed to solve the problem. This is how we built the face-hashing technology.
The problem statement
The technology should transform the face into non-PII data. The transformed data should be essentially gibberish. However, if someone else has the same person's face (not the same image), and they use the same algorithm to transform the face, the transformed data should be comparable. The accuracy of such a comparison should be within the range of face recognition accuracy (~99%).
Why doesn't encryption work?
While typical encryption sounds like it fits, it is only non-PII while in transit. However, the moment we want to compare both of the faces, the faces have to be decrypted. If the comparison is done by Regulo, then we have access to the client's PII data. If the comparison is done by the bank/fintech, then they have access to our data. While the latter case doesn't sound like an issue, it requires on-prem deployment, which has its own set of technical and scalability challenges. Mostly, clients go with on-prem deployment when facing this problem.
Some new techniques, like homomorphic encryption, are being explored to overcome this problem, but at present, nobody is working on face homomorphic encryption.
Face hashing
Regulo is combining two concepts - face recognition and hashing technology. While hashing and tokenization work for text data, there is no existing algorithm to achieve this for facial information. Therefore, we built a family of algorithms to solve the problem.
Hashing the face
We transform a given face image into a hash image by adding significant noise to the input image. Visually, you will not be able to identify the hashed image represented below. Just for clarification, the sand and pepper noise shown in the image is only for illustration. Our algorithm conducts a complex transformation to ensure that the process is irreversible.
Face hash comparison
We transform two different images of the same person and compare them using another algorithm. The result highlights that the two images belong to the same person.
The balancing act
While the transformation step adds noise to preserve privacy, it also decreases the accuracy of the comparison itself. As of end of May 2023, our algorithm has shown comparable performance to a face comparison algorithm. We are still optimizing the process and accuracy.
How will this be used in production?
We have developed two APIs for use by prospective clients:
Face hashing API: The API takes the face as input and churns out a face hash.
Face hash comparison API: The API takes two face hashes as input and highlights the comparison score.
If you plan to use our fraud screening API, you can transform the face using the first API into a face hash and call our fraud API. Our fraud faces are stored in the form of face hashes. Therefore, the fraud API compares your face hash with our face hash database and provides the results. The typical fraud hit rate is less than 1%.
Which use-cases does this work well for?
The use-cases where the overlap of the comparison datasets is low are the best. For cases where the face hashes overlap is high between two datasets, owners of the datasets know to whom the face hash belongs.
Typically, the negative use-cases like fraud, sanctions, adverse media, among others, have a low overlap with customer datasets.
The technology is far from mature, and we continue to work on the algorithms to improve performance and accuracy. If you have any questions or want to know more, please feel free to reach out to me on LinkedIn.
Regards,
Ravi