Face Recognition App Development Using Deep Learning

How Deep Learning Can Modernize Face Recognition Software

Serhii Maksymenko, Data Science Solution Architect
Serhii Maksymenko,
Data Science Solution Architect

Face recognition technology appears in a different light today. Use cases include broad application from crime detection to the identification of genetic diseases. 

While governments across the world have been investing in facial recognition systems, some US cities like Oakland, Somerville, and Portland, have banned it due to civil rights and privacy concerns.

What is it – a time bomb or a technological breakthrough? This article opens up what face recognition is from a technology perspective, and how deep learning increases its capacities. Only by realizing how face recognition technology works from the inside out, it’s possible to understand what it is capable of.

Updated 06/09/2020: Masked face detection and recognition

How Deep Learning Can Modernize Face Recognition Software

Download PDF

How Does Facial Recognition Work?

The computer algorithm of facial recognition software is a bit like human visual recognition. But if people store visual data in a brain and automatically recall visual data once needed, computers should request data from a database and match them to identify a human face. 

In a nutshell, a computerized system equipped by a camera, detects and identifies a human face, extracts facial features like the distance between eyes, a length of a nose, a shape of a forehead and cheekbones. Then, the system recognizes the face and matches it to images stored in a database.

However, a traditional face recognition technology is not all perfect yet. It has both strength and weaknesses:

Strength

Contactless biometric identification

Up to one second data processing 

Compatibility to most cameras 

The ease of integration

Weaknesses

Twins and race bias

Data privacy issues

Presentation attacks (PA)

Low accuracy in poor lighting conditions

Realizing the weaknesses of face recognition systems, data scientists went further. By applying traditional computer vision techniques and deep learning algorithms, they fine-tuned the face recognition system to prevent attacks and enhance accuracy. That is how a face anti-spoofing technology operates.

How Deep Learning Upgrades Face Recognition Software

Deep learning is one of the most novel ways to improve face recognition technology. The idea is to extract face embeddings from images with faces. Such facial embeddings will be unique for different faces. And training of a deep neural network is the most optimal way to perform this task. 

Depending on a task and timeframes, there are two common methods to use deep learning for face recognition systems: 

Use pre-trained models such as dlib, DeepFace, FaceNet, and others. This method takes less time and effort because pre-trained models already have a set of algorithms for face recognition purposes. We also can fine-tune pre-trained models to avoid bias and let the face recognition system work properly.

Develop a neural network from scratch. This method is suitable for complex face recognition systems having multi-purpose functionality. It takes more time and effort, and requires millions of images in the training dataset, unlike a pre-trained model which requires only thousands of images in case of transfer learning. 

software development through crisis

But if the facial recognition system includes unique features, it may be an optimal way in the long run. The key points to pay attention to are:

  • The correct selection of CNN architecture and loss function
  • Inference time optimization 
  • The power of a hardware

It’s recommended to use convolutional neural networks (CNN) when developing a network architecture as they have proven to be effective in image recognition and classification tasks. In order to get expected results, it’s better to use a generally accepted neural network architecture as a basis, for example, ResNet or EfficientNet. 

When training a neural network for face recognition software development purposes, we should minimize errors in most cases. Here it’s crucial to consider loss functions used for calculation of error between real and predicted output. The most commonly used functions in facial recognition systems are triplet loss and AM-Softmax.

  • The triplet loss function implies having three images of two different people. There are two images – anchor and positive – for one person, and the third one – negative – for another person. Network parameters are being learned so to bring the same people closer in the feature space, and separate different people.
  • AM-Softmax function is one of the most recent modifications of standard softmax function, which utilizes a particular regularization based on an additive margin. It allows achieving better separability of classes and therefore improves face recognition system accuracy.

There are also several approaches to improve a neural network. In facial recognition systems, the most interesting are knowledge distillation, transfer learning, quantization, and depth-separable convolutions.

  • Knowledge distillation involves two different sized networks when a large network teaches its own smaller variation. The key value is that after the training, the smaller network works faster than the large one, giving the same result.
  • Transfer learning approach allows improving the accuracy through training the whole network or only certain layers on a specific dataset. For example, if the face recognition system has race bias issues, we can take a particular set of images, let’s say, pictures of Chinese people, and train the network so to reach higher accuracy.
  • Quantization approach improves a neural network to reach higher processing speed. By approximating a neural network that uses floating-point numbers by a neural network of low bit width numbers, we can reduce the memory size and number of computations.
  • Depthwise separable convolutions is a class of layers, which allow building CNN with a much smaller set of parameters compared to standard CNNs. While having a small number of computations, this feature can improve the facial recognition system so as to make it suitable for mobile vision applications.

The key element of deep learning technologies is the demand for high-powered hardware. When using deep neural networks for face recognition software development, the goal is not only to enhance recognition accuracy but also to reduce the response time. That is why GPU, for example, is more suitable for deep learning-powered face recognition systems, than CPU.

How We Implemented Deep Learning-Powered Face Recognition App

When developing the Big Brother (a demo camera app) at MobiDev, we were aimed at creating biometric verification software with real-time video streaming. Being a local console app for Ubuntu and Raspbian, Big Brother is written in Golang, and configured with Local Camera ID and Camera Reader type via the JSON config file. This video describes how Big Brother works in practice:

From the inside, Big Brother app’s working cycle comprises:

1. Face detection 

The app detects faces in a video stream. Once the face is captured, the image is cropped and sent to the back end via HTTP form-data request. The back end API saves the image to a local file system and saves a record to Detection Log with a personID.

The back end utilizes Golang and MongoDB Collections to store employee data. All API requests are based on RESTful API.

Dlib to calculate the 128-dimensional descriptor vector of face features

2. Instant face recognition

The back end has a background worker that finds new unclassified records and uses Dlib to calculate the 128-dimensional descriptor vector of face features. Whenever a vector is calculated, it is compared with multiple reference face images by calculating Euclidean distance to each feature vector of each Person in the database, finding a match.

Instant face recognition

If the Euclidean distance to the detected person is less than 0.6, the worker sets a personID to the Detection Log and marks it as classified. If the distance exceeds 0.6, it creates a new Person ID to the log.

3. Follow-up actions: alerting, grant access, and other

Images of an unidentified person are sent to the corresponding manager with notifications via chatbots in messengers. In the Big Brother app, we used Microsoft Bot Framework and Python-based Errbot, which allowed us to implement the alert chatbot within five days.

Image of the unidentified person is sent as a chatbot notification

Afterward, these records can be managed via the Admin Panel, which stores photos with IDs in the database. The face recognition software works in real-time and performs face recognition tasks instantly. By utilizing Golang and MongoDB Collections for employee data storage, we entered the IDs database, including 200 entries. 

Here is how Big Brother face recognition app is designed:

Structure of the face recognition app

In the case of scaling up to 10,000 entries, we would recommend improving the face recognition system in order to keep high recognition speed on the back end. One of the optimal ways is to use parallelization. By setting up a load balancer and building several web workers, we can ensure the proper work of a back end part and optimal speed of an entire system.

Other Deep Learning-Based Recognition Use Cases

Face recognition is not the only task where deep learning-based software development can enhance performance. Other examples include:

Masked face detection and recognition

Since the COVID-19 made people in many countries wear face masks, facial recognition technology became more advanced. By using the deep learning algorithm based on convolutional neural networks, cameras can now recognize faces covered with masks. Data science engineers utilize such algorithms as face-eye-based multi-granularity and periocular recognition models to enhance the facial recognition system’s capabilities. By identifying such face features as forehead, face contour, ocular and periocular details, eyebrows, eyes, and cheekbones, these models allow recognition of masked faces with up to 95% accuracy.

A good example of such a system is the face recognition technology created by one of the Chinese companies. The system consists of two algorithms: deep learning-based face recognition, and infrared thermal imaging temperature measurement. When people in face masks stand in front of the camera, the system extracts facial features and compares them with existing images in the database. At the same time, the infrared temperature measurement mechanism measures temperature, thus detecting people with abnormal temperatures.

Defects detection

In the last couple of years, manufacturers have been using AI-based visual inspection for defects detection. The development of deep learning algorithms allows this system to define the tiniest scratches and cracks automatically, avoiding human factors. 

Body abnormalities detection

Israel-based company Aidoc developed a deep learning-powered solution for radiology. By analyzing medical images, this system detects abnormalities in a chest, c-spine, head, and abdomen.

Speaker identification

Speaker Identification technology created by Phonexia company also identifies speakers by utilizing the metric learning approach. The system recognizes speakers by voice, producing mathematical models of human speech named voiceprints. Those voiceprints are stored in databases, and when a person speaks the speaker technology identifies the unique voiceprint.

Emotions recognition

Recognition of human emotions is a doable task today. By tracking movements of a face via camera, the Emotion Recognition technology categorizes human emotions. The deep learning algorithm identifies landmark points of a human face, detects a neutral facial expression, and measures deviations of facial expressions recognizing more positive or negative ones.

Actions recognition

Visual One company, which is a provider of Nest Cams, powered their product with AI. By utilizing deep learning techniques, they fine-tuned Nest Cams to recognize not only different objects like people, pets, cars, etc., but also identify actions. The set of actions to be recognized is customizable and selected by the user. For example, a camera can recognize a cat scratching the door, or a kid playing with the stove.

If to summarize, deep neural networks are a powerful tool for mankind. And only a human decides what technological future is coming next.

How Deep Learning Can Modernize Face Recognition Software

Download PDF

Insights
Face Anti-Spoofing Techniques to Prevent Presentation Attacks

Anti-spoofing techniques in face recognition

Insights
Multimodal Biometric Verification for Business Security System Development

Multimodal Biometric Verification for Business Security

Insights
AI-Based Visual Inspection For Defect Detection1

AI-Based Visual Inspection For Defect Detection

Want to get in touch?

contact us