xeric facial recognition whitepaper

14

Upload: sunny-tan-msc-mciiscm-ccsmp-pmp-pmi-rmp

Post on 11-Apr-2017

204 views

Category:

Technology


0 download

TRANSCRIPT

Facial recognition is both a science as well as an art. While there are various theories available on getting it correct, it depends on real situation on ground to achieve the intended result. Apart from quality of enrolled and probe images, it is relatively important to consider other non-technical aspect which could be equally important. This guide does not gives you a special formula to cover all situation, but to provide su�cient guidance to achieve optimum result. Many of it might be very minor that one could easily missed out and spend much time and resources to eventually rectify it.

Introduction

Once those essential considerations are being taken care during design and installation, you will be able to:

• Set and manage the appropriate level of expectations with customers.

• Design and proposal the most suitable solution that deals with real onsite conditions.

• Achieved the highest probability of intercept with best quality face indexes with given conditions.

What can be achieved?

This guide will illustrate and explain various key considerations that are critical to achieving optimum result with Xeric. There are few fundamental principles on CCTV that will still be applicable while using Xeric. On top of those, we will also need to consider quality of images, camera installations, lighting, etc. We will also be introduction COMAH concept, which represents few important critical of consideration while designing the solution onsite.

These factors are inter-related and they should not be considered individually. Instead, they must be examined holistically onsite to know how each impact the other and the entire solution as a whole.

Key Considerations

Xeric Facial Recognition Whitepaper Page 1

In this section, we will �rst exam the quality of images and they are broadly classi�ed into 2 categories. Enrolled images refer to those images that are being imported into Xeric as watchlist database. These images formed the watchlist which are stored on Xeric and being used to make comparison against those live images. Images that are being captured by cameras are usually being referred to as probe image or live images.

Quality di�erences between enrolled and probe images will a�ect Xeric’s accuracy. Facial recognition is fundamentally pattern matching algorithm at work. It evolves from matching algorithm using simple geometric models to sophisicated mathematical computations and matching processes. Thus the di�erences does contributed to overall performance of Xeric. In this section, we will examine recommended quality of both enrolled and probe image.

Image Quality

Enrolled images are those images that are being imported into Xeric watchlist database. They formed the database which probe images will be used to search against it for similar faces. There are few ways to obtain images for importing into Xeric, following are some of those common means:-

1. Passport Photographs / Mugshots Usually these photos will comply to a standard, such as ICAO, to ensure those images are of good quality. It is recommended to have a higher resolution image as high resolution images are able to resize to input it into the database. 2. Live video feeds Xeric will indexed all the faces that are correctly presented in front of cameras. These indexed images often provide the best form of enroll images as they are obtained from a source which will also be used to capture probe images. 3. Images from Identity card, newspaper, magazine There might be requirement for some user, such as law enforcement, to provide images that are not of high quality. They can be images from an identity card, faces that appear in newspaper or magazine. If the photo are of decent size and quality, these images can still be used as enrolled images.

Identity Card such as one on the left can be used to extract the photo of the person as enroll image. However, the quality of the photo play a part in the accurancy.

This image is an example of passport photo or mugshot that can be used as enrolled image.

Live video feed, such as above, can be used to input into Xeric and extract face images which can be used as enrolled image.

Enrolled Images

Xeric Facial Recognition Whitepaper Page 2

Probe images are images that are being used to compare or match against those enrolled images in the database. Typically, a probe image are being obtained using live video feed when a face are being presented in front of camera. Xeric’s algorithm provide face tracking and face extraction, and putting the face as an indexed image within Xeric. All indexed images are being searched against watchlist database to determine if there are any match. Apart from using live video feed, facial image or recorded video can also be input into Xeric to compare against watchlist database.

You will noticed that di�erences between enrolled images and probe images are in the way that they are being used. Both image types need to ful�ll certain minimum requirement in order to provide optimal performance. This is due to fundamentally, facial recognition are pattern matching between 2 images and similarity scores are being provided. Thus the higher the level of similarity between 2 images, the higher the accurancy of match.

Good lighting level is essential to capture details on a face clearly for analysis purpose to achieve the aim of facial recognition. A good lighting level should evenly lit a face, clearly show facial details of a person. It should balance between having an image that is either too bright or too dark, where details are lost or hidden. This a�ects the level of accuracy of facial recognition engine.

Photo 1 below shows an example of Good Lighting. The face is evenly lit without any burnout or shadow zone. There is a distinct di�erent between foreground, which is the face, and background, which could be a wall. Ideally, all photos should be enrolled or indexed as such to provide optimal performance.

Photo 2 below shows an example of Overexposed image. There are no or very little shadow areas and many parts of the face are being burned out or with glare. As compared to Photo 1, the glare erased the boundary of spectacles and other details around the cheeks, nose and forehead area.

Photo 3 below shows another extreme contrast to overexposed, Underexposed. An underexposed image does not have su�cient lighting and thus a lot of details are being hidden.

This image on the left shows an outcome of a successful match. The left most image is the indexed image from a live video feed. This image is being matched against a watchlist database, which contains this enrolled image. This enrolled image is the same person close to 20 years ago.

Photo 1 - Good Lighting Photo 2 - Overexposed Photo 3 - Underexposed

Probe Images

Lighting Level

Xeric Facial Recognition Whitepaper Page 3

Camera is the sensor that captures facial image for facial recognition engine to perform its analytic task. Customer might want a CCTV camera to perform multiple role, whereby saving the number of cameras that they need to install. Many existing cameras are not suitable for facial recognition due to resolution, �eld of view (FoV) and camera angle. We will be covering more on these aspect at later section.

The most versatile camera type that we always recommend is the box camera, which enable one to �t with a telephoto lens. This provide the �exibility of installation as well as getting the required pixel between eyes and FoV needed for facial recognition. Xeric do not limit itself from using just box camera but match the type of camera based on customer’s requirement. In extreme cases, Xeric uses webcam or built-in camera on laptop to perform its role.

Camera Selection

Photo 2 and Photo 3 could be the resultant of improper lighting condition or camera settings. In most cases, IP cameras are able to cater to relatively wide range of lighting. It is highly recommended to leave the exposure setting of the camera to the camera itself. One might just indicator a speci�c zone or area which need to be properly exposed to cater for various lighting conditions. Lighting, apart from �xed lighting, could also mean sun rise, sun set, overcast, re�ection, etc. In most cases, lighting level of an image falls between Good Lighting condition and both extreme of over and under exposed image.

Resolution can be de�ned as level of visual detail or clarity in an image. In term of CCTV terminology, it is de�ning the horizontal and vertical pixel count of a camera. Take for an example, a typical IP camera could be 720p, 1080p or 3.1 megapixels or 5.2 megapixel. It is also further divided into whether the aspect ratio is 4:3 or 16:9. In a typical 720p 16:9 format, one will get 1280 pixels horizontally and 720 pixels vertically.

So what is a pixel? Pixel means “Picture Element” and they are the smallest unit of element that makes up of a picture. Translating this into 720p resolution, we will be 1280 picture elements laying horizontally and 720 picture elements laying vertically to form an image. The table below shows di�erent resolution level of a CCTV camera.

How does resolution a�ects facial recognition? Xeric required a minimum of 30 pixels between eyes, recommended 120, to perform accurate facial recognition. A high-resolution camera is able to produce higher pixel count, which equate to index and match more people as compared to camera with lower resolution within a common FoV.

Resolution, together with frame per second, activites level and duration of activities determine your network bandwidth and required storage. In Xeric’s case, Xeric do not really required huge amount of storage as Xeric do not need to store videos. Images are being processed at the front end and sending those processed images of a single person required approximately 10kB per image. By default, Xeric captured 5 images per person which make it 50kB per person.

The Resolution Table on the next page illustrate to you various common resolution that you encounter, the number of pixels and their common term. We will also show you the relationship between resolution, frame rate, duration and computing the video size.

Resolution

Xeric Facial Recognition Whitepaper Page 4

Field of View (FoV) refers to an area that can be seen through a CCTV camera, which is in direct relationship to the lens attached. The 2 images below are example of same scene taken with 2 di�erent horizontal FoV.

From the look of it, the di�erences in FoV do not seem to be very signi�cant. Image on the left is approximately 4 feet wider than image on the right. However, this slight di�erence does make signi�cant di�erences if this image is suitable for facial recognition. When we look at the image on the left, we are achieving approximately 32 pixels between eyes on a 720p image. This image just managed to meet the minimum requirement for facial recognition. Using same camera setting and by just having a more narrow FoV, we are able to achieve 65 pixels between eyes. This is illustrated in the below images.

Wide Field of View Narrow Field of View

Field of View

Xeric Facial Recognition Whitepaper Page 5

The images above further illustrate the di�erences of using a suitable lens for getting the appropriate FoV. While a camera is looking at a particular scene, a varifocal len will be able to bring your zone of focus “closer” to you.

Using this principal, facial recognition requires a narrow FoV where we are able to achieve good pixel count between eyes. Apart from good pixel count, it is important to get the image to be in good focus, which is easier to achieve in narrower FoV. By controlling the pixel count between eyes, we are also able to de�ne our “capture zone” where faces of people within the zone will be indexed.

There is another term which one will need to take note as this a�ects your object focus, Depth of Field (DoF). DoF is a zone de�nes within a minimum distance and maximum distance away from the lens where object within it will be in focus. This zone is a�ected by iris setting of the camera, which deals with the amount of light that enters the lens.

You can see from the previous example that having an appropriate FoV increases pixels between eyes. This can be achieved using a varifocal lens, that gives you the �exibility to zoom into your Area of Interest (AOI). Many people mixed up between optical zoom and digital zoom and while they enable you to focus on your AOI, it does not bring the same result.

Optical zoom is achieved using a physical zoom lens. There is no change in resolution and you achieved the same pixel count both horizontally and vertically. Digital zoom is achieved using in-camera processing to enlarge the AOI and cropping away those edges. While you can maintain the same aspect ratio, this AOI enlarges the pixels and reduces both resolution and image quality.

3.0mm lens

10.0mm lens

40.0mm lens

Resolution + Field of View

Xeric Facial Recognition Whitepaper Page 6

Using same settings of the images above, what would be the result of using a higher resolution camera? You must have gotten it correct! Yes, you can expect to have higher pixel between eyes even with wide FoV. However, look at the pixel between eyes when appropriate FoV is being achieved, a whopping 99 pixel between eyes! You can see the illustration at the images below.

A good rule of thumb to follow is to �t the face into a box estimating 800 x 600 as shown in the image below. While using a 1080p camera, the FoV should only �t approximately 3-4 people within the view. These recommendations enable you to obtain optimal pixel count between eyes to enable Xeric to achieve its best performance. However, there are other factors at work, which include enrolled images and COMAH. We will cover what is COMAH at later section, which presents to you �eld issues and you might be facing too.

Xeric Facial Recognition Whitepaper Page 7

Having a good facial recognition engine, good lighting and good camera con�guration only solve part of the equation of good facial recognition. If your camera is not being installed at appropriate position, your rate of indexed and accuracy will be a�ected. This section provides recommendations to physically install your cameras at position that increases probability for facial recognition.

The best place to place a camera is directly in front of the person at eye level. This gives you direct frontal face image that is the best for facial recognition. Realistically, this is often not possible without obstructing human movement. On another aspect, this could be a covert operation which general public cannot be alarmed and will be non-cooperative. Before looking at how cameras could be mounted, let’s see what are the limitations that we are addressing.

Facial recognition works fundamentally by comparing both images and provide the result on their similarity. Probe image that looks similar to enroll image yield better result as compared to those images that looked di�erent. Pose di�erences played a signi�cant part in matching same person in di�erent images that looked at di�erent direction. Laboratory studies show that when templating error is taken out of consideration, most facial recognition engine give similar performance in terms of accuracy. Xeric’s engine taken this into consideration and automatically perform pose correction on probe images prior to matching them against enrolled images. What’s even better is that current JPEG image database can also be pose corrected before enrolling them, ensuring legacy databases can be use.

Even with good algorithm, it is important to take Pitching, Rolling and Yawing of faces into consideration when planning out where to position facial cameras. This ensures o�set of each axis are kept within tolerance in order for algorithm to accept and index a face before rejecting it. The images below illustrate to you what are the di�erences between pitch, roll and yaw.

Pitch Down Pitch Up

Roll Right Roll Left

Yaw Right Yaw Left

Camera Installation

Angle of Capture

Pitch, Roll & Yaw

Xeric Facial Recognition Whitepaper Page 8

The best position to capture face for a facial recognition camera is right in front of the person. However, this is not possible in most cases where people could be moving around freely and without obstruction. Another alternative is to mount the camera at a higher position, achieving approximately 20% Vertical Angle of Incidence. There is a simple formula that you can use to determine the recommended capture distance when given mounting height of camera.

Example 1:You have a ceiling height of 3.5m and you need to determine how far away from the camera to setup a chokepoint. A typical Asian Male has his eye level at approximately 1.5m. Applying these information into the formula gets the following:-

(3.5m - 1.5m) / 0.2 = 10m

In this example, you can setup the chokepoint approximately 10m away from camera mounting point.

Example 2:You will need to setup a chokepoint 8m away from camera mounting point and the ceiling height is 3.5m. How can you apply this information to the formula?

Restructuring the above formula brings you to (CD * 0.2) + EL = CH(8.0m * 0.2) + 1.5m = 3.1m

Recommended mounting height is 3.1m, which is slightly lower the ceiling. You can consider having a dropdown pole to bring camera height to 3.1m for best performance.

Position for mounting the camera put your understanding of FoV, Resolution and Pixels between eyes at test. A good installation takes into consideration all of the factors to determine appropriate camera, lens, location and height.

(CH - EL) / 0.2 = CDWhere CH = Ceiling Height, CD = Capture Distance and EL = Eye Level

Placement of camera must always take consideration of achieving frontal facial view as much as possible. However, in real situation, this is hard to achieve especially with covert cameras and uncooperative people. It is thus important to consider COMAH for each camera in relation to overall objectives of your deployment. The next section provides a rule of thumb and simple formula to calculate where could you mount your cameras.

Camera Mounting Formula

Xeric Facial Recognition Whitepaper Page 9

Performance of facial recognition system involved much more than having a very accurate engine. Primary function of facial recognition is to capture facial image of human for matching against database. In this case, one need to take into considerations both human and environmental aspect to optimise opportunity and quality to achieve accuracy. In order to do that, it is recommended to consider COMAH (read “coma”). COMAH stands for Chokepoint, Obstruction, Movement, Attraction and Human.

Obstruction could be a permanent or temporary object placement that impedes or prevent passage. Obstruction in our context could mean a pillar, wall, plants or people who could be standing between our subject and camera. Besides looking at the �oor plan to determine placement of camera, it is even more critical to visit the site.

Visiting the site during di�erent time of the day, di�erent days of the week enable one to know if there could possibility of temporary obstruction. Talking to customer on their future plan of the area able us to know potential permanent obstruction. Obstruction might divert people away from the area or reduces probability of getting facial images of subject.

Chokepoint can be de�ned as a narrow passage where one need to pass through to reach their destination. A chokepoint can be a gantry, metal detector or gate, etc. as it depends on customer’s requirement. Capturing of faces in this zone provide many advantages:-

1. Walking speed of people will be reduced to increase probabilities of getting image on people looking straight and level. Stationary subject or subject with slower pace reduces motion blur without need to increase frame rate. Higher frame rate could resulted in higher storage requirement which equals to higher cost for customer.

2. Infrastructure can be enhanced over this region to provide evenly di�used lighting to improve quality of images. Uneven lighting, backlighting and natural lighting could reduces accuracy of facial recognition.

3. This zone enable camera to focus on a speci�c wide to achieve optimal pixel count between eyes. Taking this into consideration, the setup might require only a 720p or 1080p camera. Comparing this to wide FoV that might require much higher MegaPixel camera and resulted in high processing requirement.

Chokepoint

Obstruction

COMAH

Xeric Facial Recognition Whitepaper Page 10

Having camera looking over an area with more people increases probability of getting their faces as compared to an area with lesser crowd. Studying movement of people in that area is important to position camera at the most appropriate location. Usual peak movement occur before working & school hours, after working & school hours, major event as well as holidays. In some circumstances, people might be taking short cut to save on time or to avoid a detour.

In order to know all these, it is important to visit the site over varies period to get better understanding on these matters. Depending on the requirement, changes to site could be proposed to create natural �ow of human to where you want them to be.

Attraction could be anything that draws attention of people by looking at the attraction or physically moving towards it. Subject could already be in chokepoint and something draws their view away from the front and look towards it. Depending on the angle of pitch, roll and yaw, accuracy of match could be lower. It could also be falling out of the limit and no face being indexed.

However, we could be using attraction to our advantage. Instead of looking towards another zone or place, we could “encourage” subject to be looking straight in front of them. There could be some interesting features that enable them to look directly ahead, capturing their best pose. One point to note is that this attraction needs to be constantly changing to keep the subject engaged or interested to continue looking at it.

Besides being an art and science, Facial Recognition also deals with culture and human behaviour. Without taking this aspect into consideration, e�ectiveness of capturing faces of people could be limiting.

In some countries, ladies might need to wear veil and men might need to keep their beard for religious reason. However, with increase in global travel, similar situation could happen anywhere when people move around. Human behave di�erently when they are approaching a swinging door, escalator, elevator, obstruction, etc. Prior to �xing camera directly in front of their walking path according to placement formula, it is important to study how do they move to determine the capture area. Another bigger phenomenon is what we known as “Mobile Zombie”, people who keep staring down to their mobile phone. In order to capture their faces, you might need to think of creative ways to capture their attention and lift their head.

Movement

Attraction

Human

Xeric Facial Recognition Whitepaper Page 11

While it is important to have an accurate, e�ective and e�cient facial recognition engine, it is equally important to put other factors into consideration. A physically challenging installation with comprehensive considerations could achieve high probability of indexed and high accuracy. A relative simple installation without much consideration could put one in situation wondering why performance is not up to par.

Below summaries all the major considerations that had been discussed in this whitepaper:

1. Facial Recognition is matching 2 images together to product the result. Quality di�erences between Enrolled and Probe image has to be as close as possible. 2. Lighting on both enrolled and probe image a�ects clarity of facial details to achieve high accuracy. 3. Consideration on camera type, resolution and lens are important. Depending on each physical requirement and its a�ect on storage system, di�erent combination of camera + resolution + lens are being used. It is not recommended to stick to a permanent con�guration for every scenario. 4. Chokepoint, Obstruction, Movement, Attraction and Human a�ects various considerations. Height of camera, location of mounting, etc. are being a�ected by COMAH to give the best facial angle, evenly distributed lights and maximum frontal facing path. 5. Most importantly, it is highly recommended to test all your assumption onsite to eliminate those issues that cannot be uncovered during tabletop planning.

This whitepaper is never meant to be exhausive and not a “must follow” rule. This is a guideline based on situations we saw on the �eld and experienced that we accumulated after spending many nights working on installations. This whitepaper aims to enable you to leverage on what we have already know and jump-start to provide an e�ective, e�cient and accuracy solution for your customer.

Conclusion

Sunny Tan | MSc, MCiiSCM, CCSMPhttps://sg.linkedin.com/in/sunnytan30May 2016

Xeric Facial Recognition Whitepaper Page 12

Info-Technologies & Trade Pte Ltd81 Ubi Avenue 4,#07-22 UB.OneSingapore [email protected]