Designing Custom Face Detection & Recognition Software

June 04, 2019 77 Views
← Back
Designing Custom Face Detection & Recognition Software


Over the last few months, one of our teams was busy designing a solution that employs biometric identification and works within real-time video camera streaming. The system workflow had to comprise 3 main stages: detection of a face, near-instant recognition, and further action to be taken (e.g. grant access to authorized personnel or send a corresponding alert otherwise).


Such is our research project that we called Big Brother. Its all-seeing eye uses computer vision to analyze frames of a stream and detect faces. See how it works in the video below.





The potential of this solution can be fully realized in your own custom product, so feel free to address us with your ideas. Meanwhile we'll reveal a few technical details on Big Brother's workflow and constituents.



Face detection and recognition workflow



The entire process is triggered by a camera application, which is installed on a device that's connected to a camera. This application was written in Golang as a local console app for Ubuntu and Raspbian. Upon launch, it must be configured using a JSON config file with Local Camera ID and Camera Reader type.


The camera app detects faces in the stream using a deep neural network and computer vision. During the research, we found that the two optimal ways to implement detection were Caffe face tracking and TensorFlow object detection models. Both of these were included in the OpenCV library and worked well when we tested them.



The camera app detects faces in the stream using a deep neural network and computer vision



When the face is captured, the image is cropped and sent to the back end via HTTP form data request. The back end API saves the image to a local file system and saves a record to Detection Log with a personID.


The back end has a background worker that finds records with "classified = false" and uses Dlib to calculate the 128-dimensional descriptor vector of face features. Whenever a vector is calculated, it is compared with the existing entries (multiple reference face images) by calculating Euclidean distance to each feature vector of each Person in the database, finding a match, if any.



For each detected face, the system builds a 128-dimensional descriptor vector of its features.



This is what it looks like in the code: each point index is properly defined in Dlib and represents various parts of a face.



Points represent various parts of a face



If the Euclidean distance to the detected person is less than 0.6, the worker sets a personID to the Detection Log and marks it as classified. If the distance exceeds 0.6, it creates a new Person with unknown type and enters a new personID to the log.


Images of an unidentified person can be sent to the corresponding manager with notifications—in this case, we implemented chatbots in messengers. Then it will be possible to grant remote access manually or take other actions.



Image of the unidentified person can be sent to any messenger as a chatbot notification.



Note: our case showed that the simplest alert chatbots required 2-5 days for implementation. We made two examples with Microsoft Bot Framework and Python-based Errbot.


Afterwards, these records can be managed via Admin Panel, which stores photos with IDs in the database. A database of employee pictures can be prepared and entered into the system beforehand.


Note: the database used in our case includes nearly 200 entries. Everything works in real time, and recognition happens instantly. But what's the potential of this solution if we need to scale? What if we need dozens of cameras? What if databases must comprise over 10,000 entries? Naturally, this will affect the speed of recognition on the back end. This is where our team came up with a solution – parallelization. We can set up a load balancer and build several Web workers for simultaneous work. Each of them can take a chunk of the entire database to search for matches and provide faster results.


To sum things up, here is the structure of Big Brother and the technologies we applied.



The back end utilizes Golang and MongoDB Collections to store employee data



Note: the back end utilizes Golang and MongoDB Collections to store employee data. All API requests are based on RESTful API. This system can be tested on regular workstations.



Revealing the potential of Big Brother



This solution can store and manage large amounts of data. With this data and well-defined business needs at hand, Data Science models could be created and trained to get business insights. For example, to analyze customer behavior and predict demand in retail. We already have experience of applying Data Science for demand forecasting, which you may check in our ERP development case study.


Note: Big Brother is not limited to face recognition. We can develop solutions that detect and identify other objects. This is a semantic segmentation issue related to Computer Vision: depending on segmentation classes (object type, amount, accuracy, etc.), we can create and train a model which recognizes different objects.



Data Science & Machine Learning



Anti-spoofing measures to boost security of face recognition



Face recognition can be enhanced with auxiliary security measures. After all, a facial image is the easiest biometric identifier one can get, way easier than getting a fingerprint or retina scanning. This is where it becomes vital to take additional anti-spoofing features so that no one could get access by showing a picture of a face.


Here we can apply special algorithms based on a combination of data science, computer vision and deep learning:


1. Detection of eye movement: for example, a person blinks 15-30 times per minute. Thus we can check face liveness.

2. Texture analysis: we can get hand-crafted features and detect the difference between a real face and a fake one, which typically has less elaborate details, noise and other image artifacts. With these features at hand, we can apply support-vector machines for binary classification.

3. Active camera flash: it easily helps to tell the difference between a real face with complex geometry and an image with plain surface.

4. Challenge-response protocol: give instructions to the user—e.g. wink, smile, say something, or turn their head—and, again, see whether the face is real.

5. These ways can be combined. For example, we can use both challenge-response and texture analysis to counteract both photo- and video-spoofing.


The Big Brother project allowed us to work out the algorithms and mechanisms that will enable faster development of custom solutions for face-based authentication. That said, feel free to address us with any questions.



Contact us today!


Read more:
scroll top