May 1, 2023
I have been in the AI space for over 6 years now, tracking the progress of computer vision, especially face recognition, and NLP technologies. While I am not an expert, I have been working on leveraging these technologies to find real-world use-cases. In general, I've observed a 5-year gap between the maturity of technology and its application in the real world.
For instance, ChatGPT gained prominence this year, but BERT-based NLP algorithms surpassed human performance in 2018. The SuperGlue Benchmark leaderboard showcases seven algorithms that currently perform better than human baselines across ten test datasets.
Over the years, I have received numerous queries about the maturity of face recognition itself. Some common questions are as follows:
How do you ensure the quality of captured faces?
How do you handle changes in facial appearance due to aging?
How do you deal with different orientations and varying face quality?
Before I answer these questions, let's briefly discuss the history of Computer Vision (CV), of which face recognition is a subset.
CV showed early signs of progress through AlexNet in 2012, achieving a remarkable 10.8% improvement in accuracy. This breakthrough introduced terms like convolutional neural networks (CNNs) and graphics processing units (GPUs). Katie Huang, wrote a detailed article on Medium documenting the history of face recognition (a picture of the same is below). By 2015, we witnessed real-world implementations of face recognition in applications like Google Photos and Facebook's auto-tagging feature.
Face biometrics and matching technologies have since undergone significant advancements, reaching a level of maturity that enables highly accurate and reliable identification. Even developers with a basic understanding of CV can now develop and deploy face recognition algorithms with an accuracy comparable to the best global algorithms, differing only by a small margin.
To illustrate this progress, let's delve into the results of the National Institute of Standards and Technology's (NIST) face recognition evaluations. NIST conducts rigorous assessments of face recognition algorithms and systems, offering valuable insights into their performance. The NIST Face Recognition Vendor Test (FRVT) is a benchmark that assesses the accuracy and efficiency of various face recognition technologies across seven test datasets.
These datasets cover a wide range of scenarios, as listed below:
Visa images - face images used for visa applications; includes over 100,000 images from 100 countries.
Application images - typically captured by digital applications; includes over 1,000,000 images from over 100 countries.
Application images with head yaw - typically captured by digital applications but the subject is not looking at the camera (angles of 25-85 degrees to camera); includes over 100,000 images from 100 countries.
Border crossing images - images of passengers are taken with a camera pointed by an immigration officer. There are multiple challenges in these pictures like overexposure, blur, subject movement etc; includes over 1,000,000 images from over 100 countries.
Mugshot images - over 1,000,000 images from USA.
Kiosk images - over 1,000,000 images captured by a kiosk. Cropping of images and subjects looking down are some of the challenges.
Wild images - photojournalism-style images; includes over 100,000 images.
These datasets encompass challenges such as variability in angles, image quality, and backgrounds as shown in the images above. The remarkable aspect is that different players in the market have achieved impressive results despite such variability.
Results of NIST's evaluation of face recognition algorithms:
The latest NIST report, published in April 2023, included evaluations of 501 developers worldwide. The results highlight the relationship between false positives (bad people gaining access) and false negatives (good people being rejected). The False Match Rate (FMR) is set at an extremely low threshold, and developers' algorithms are measured according to the False Non-Match Rate (FNMR).
Considering the VISA dataset as an example (image above), the results indicate the following:
A false positive rate (FMR) of 1 in a million or 0.0001%.
The lowest false negative rate (FNMR) is 0.38%, meaning their algorithm generates less than 1% false negatives while keeping false positives almost negligible.
The highest false negative rate (FNMR) is 7.32%, generating approximately 7% false negatives while maintaining a negligible false positive rate. This developer is ranked 415th out of 501.
Although the difference may seem significant, it's important to note that NIST sets a very low false positive threshold. In the real world, a 7% rejection rate is not very high.
In summary, even a developer ranked 415 has achieved results that are reliable in real-world scenarios. Therefore, you can be confident that whichever face recognition vendor you choose, their model will provide good results.
If you have any questions or want to know more, please feel free to reach out to me on LinkedIn.
Thanks for reading,
Ravi