real time uas video analytics

DRAFT Real-time UAS Video Analytics

In Mission Exploitation of ISR Image and Full Motion Video Content

Gregory B. Pepus

Tuesday, February 22, 2012

Version 1.2

Globally, public sector customers are spending $6.5 billion dollars or more annually to deploy Unmanned Aerial Systems (UAS)

which are a combination of Unmanned Aerial Vehicles (UAVs) and associated ground station technology and personnel. Flex

Analytics and its partner piXlogic offer the means to automatically exploit UAV (and other) collected image and full motion

video in real-time both during and after the mission to produce actionable intelligence.

i

Synopsis ........................................................................................................................................................ 1

UAS ISR Background .................................................................................................................................. 1

piXlogic’s Image and Video Analytics ........................................................................................................ 2

Other Capabilities ..................................................................................................................................... 4

Classification, Matching and Tagging ........................................................................................................ 4

Text in the Image ...................................................................................................................................... 6

piXserve for UAS and Manned Aircraft ISR Missions ................................................................................ 6

Concept of Operations for UAS Image/Video Alerting ............................................................................. 7

UAS Bandwidth Considerations for ISR Video and Imagery ..................................................................... 8

Summary ................................................................................................................................................... 9

1

Synopsis On a global basis, military, police, emergency and other public service providers are spending in excess

of $6.5 billion dollars or more annually to deploy a range of Unmanned Aerial Systems (UAS) which are a

combination of Unmanned Aerial Vehicles (UAVs) and associated ground station communications,

computer technology and personnel. The single most important use for UAS is the Intelligence,

Surveillance, and Reconnaissance (ISR) mission.

UAS ISR Background The principle means by which ISR is carried out via UAS is through Full Motion Video (FMV) and/or still

image collection as seen in Figure 1. In practice the ISR function is used for immediate mission support;

unfortunately, most of the collected surveillance material (over 99%) remains unexploited and

unexploitable in any large scale automated sense and this is a global problem.

Figure 1 – UAS ISR – UAV Based Image and Full Motion Video Sensors

Collected image and video content is most certainly NOT used in any great scale for real-time generation

of in-mission actionable intelligence except for the” immediate” limited surveillance needs of the

current mission while flying the aircraft. This is because the UAS have not, as a rule, included the

analytical technology, either in the bird or on the ground, necessary to support automated near real-

time exploitation of the incredible volume of video and image material being collected.

Globally customers are buying the air platforms, cameras and storage facilities to generate, view and

store this information; but have, in practice, spent almost nothing on automated exploitation of that

information particularly in near real-time during actual mission operations. The limited amount of video

and image exploitation that is done is carried out by people in an attempt to analyze less than 1% of the

overall image and full motion video content that’s collected via UAVs as shown in Figure 2. The principal

problem is that human’s only have limited ability to actively analyze video before there are significant

decreases in concentration, object recognition and cognitive performance.

2

Figure 2 –UAS Ground Control Systems

Why don’t customers usually consider automated image and video analytical systems? Most don’t know

the art of the possible and those that have experience with related technologies have found these

systems to be highly inaccurate producing unreliable analytical results. However, the need is real and

ever growing. It is compounded because most UAS have multiple cameras and individual humans cannot

actively split their focus in a detailed way on multiple video feeds and effectively capture actionable

intelligence from those feeds for a sustainable period.

Fortunately, the state of the art for automated exploitation and analysis for image and video content is

now highly reliable and the technology is most definitely ready for primetime. Flex Analytics LLC and its

partner piXlogic, Inc. offer automated image and video technology solutions that are immediately

available to help the government analyze image and video content in order to produce actionable

intelligence during and after UAS mission operations. These solutions allow “accurate” automated

targeting and tagging of mission specific objects of interest - leading to immediate actionable

intelligence.

Additionally, state-of-the-art analytic solutions for ISR FMV and image allow customers to dramatically

reduce the amount of stored content because analytics would identify the immediate images and/or

video content that contained important material and record ONLY that information in a searchable

index. This approach could lead to a 25% to 50% or higher reduction in the amount of video and/or

image content retained – immediately saving customers millions of dollars in storage costs and allowing

a complete and immediate exploitation of collected content.

piXlogic’s Image and Video Analytics piXlogic Inc. offers piXserve an image and video search and analytics platform that automatically indexes

the objects within an image or frame of video. Think “Google for image and video” where no manual

intervention is required to automatically tag objects of interest for immediate actionable alerting, search

3

and analysis. Using image processing search and analysis algorithms that work similarly to human

vision, piXserve is able to identify particular specific objects at the pixel level in image and video content.

For example, a user can provide an image of a minivan and the software will match it to all occurrences

of the minivan found in the current corpus of image or video content as shown in Figure 3.

Figure 3 – Searching for an Object (e.g. Minivan) And Getting Results Back

piXserve identifies and then describes in an index the object’s shape and immediately attempts to

classify that shape and also to match it with specific library of objects of interest to get not only a

classification but also a specific match. When the software identifies an object it creates a mathematical

geometric description of that object recognizing all sorts of details such as edges, texture, color, lighting

and many other factors and those facets of information are inserted into the index and made

searchable.

When it is able to classify an object or match it to a specific library item it then tags1 the object in the

search index. Once indexed, piXserve allows image and video content to be “searched” using either an

image of the object of interest (or something similar), or using a keyword describing that object or by

using other search criteria to be described later in this document. Figure 4 shows how the software

might be used to find a ship of interest by searching using parts of that ship which lead to a match. For

example, the software might identify key markings, flags2, text on the object or other particular features

and apply metadata to that image or video which is stored in the searchable index.

1 Tags refer to the process of automatically, without human intervention, applying labels to objects in image or

video content. 2 The flags are too small in this image for the software to accurately identify

4

Metadata

Flag

Flag

Mark

Ship Line

Ship Name

Figure 4 – Finding an Object

Other Capabilities The software also has the capability to do “facial biometrics” to National Institute of Standards and

Technology (NIST) specification. NIST has a full facial biometrics program and a range of data and

testing material to support biometric facial recognition software capabilities. NIST has had several

industry competitions and piXserve has developed its facial biometrics based on the data and

requirements put forward NIST’s Facial Recognition Grand Challenge (FRGC)3.

Additionally, the software has the ability to recognize “text” in image or video. It is able to identify not

only the language but also the “text string” on an image. This capability is not subject to the technical

limitations of optical character recognition (OCR) because it uses different technology. That is, it uses

piXserve’s ability to identify and recognize specific shapes applied to text in more than 28 languages

including “Chinese, Korean, Russian and a range of Western languages”.

Classification, Matching and Tagging As mentioned above the software has the ability to generally classify an object using a concept piXlogic

calls a notion4. This capability allows piXserve to have a notion for and identify the broad class of things

to which an object it has identified belongs. For example, piXserve is able to classify things like, sky, sea,

tree, person, building, window and many other classes of objects.

3 http://www.nist.gov/itl/iad/ig/frgc.cfm

4 Notions are essentially an “ontology” broadly identifying a class to which an object belongs – i.e. that green thing

is a tree, the blue stuff is sea or sky etc.

5

Tree Person/Face Sea/Sky Building Window Figure 5 – Notions for Identified Objects of Interest

When the software recognizes a specific class of object it is able to “tag” that object with a keyword. So

as shown in Figure 5 the software would try to tag pictures of trees, person, face, sea, sky, building,

window accordingly.

However, the software can go beyond that. For example, if you were trying to find a specific vehicle in a

UAV video, one could develop a library5 containing a couple of example pictures of that vehicle from

different angles. The software would use those examples to specifically match all instances of that

vehicle in that video and tag that vehicle according to the library images filename. In this case the

examples of the vehicle might have the filenames Minivan,1, Minivan,2, Minivan, 3 etc.

Minivan, 1 Minivan, 2 Minivan, 3

Figure 6 – Small Library of a Minivan

For example, as shown in Figure 6, we could create a library of a few images of a minivan as indicated

above then when the video of interest is processed (indexed) by piXserve it attempts to label all

instances of the minivan that occur in that video as Minivan. Then, instead of using a picture of a

minivan to search with, users would simply use piXserve’s keyword search function typing in the word

“Minivan” and getting a list of all search results back as shown below in Figure 7.

Furthermore, it will tag all occurrences of that person in incoming new images or video and facial

recognition in this case approaches 98% accuracy on average based on NIST standards.

5 Library in piXlogic parlance is a folder with images to be matched whereby the filenames are used to provide

tagging of matches in the incoming corpus of image or video content.

6

Figure 7 – Finding the Minivan Using Keywords/Results Shown to Right

Text in the Image piXserve, as mentioned above, has the ability to recognize text in the image. Unlike OCR technology

which is highly sensitive to page color, font color and font type, piXserve can identify text in image under

almost any condition of foreground or background color or font. When an image or video is indexed

with the alphanumeric option set, users can search an index of image or video content using the text in

image option. This allows video to be searched by its banners (e.g. a news broadcast), or text on any

object such as a street signs or building names or characters on a t-shirt, or license tags, or writing on a

ship, truck, train or plane etc. as shown in Figure 8.

Pacific Line Maryland Free Meals United Way … China Shipping … N65482

Figure 8 - Finding Text on the Image

piXserve for UAS and Manned Aircraft ISR Missions Implementing a piXserve solution for UAS and manned aircraft ISR missions makes a tremendous

amount of sense. piXserve can be used in different contexts to help produce actionable intelligence

either during or after the mission based on several different approaches used during the UAS and / or

manned ISR missions. These approaches include:

7

1) Image and video is collected in real-time from the aircraft because communications links are

robust enough to handle the higher bandwidth demands of image or video content;

2) A targeted subset of image and video content can be collected in real-time from the aircraft

because communications links are robust enough to handle some but NOT all of the higher

bandwidth demands of image or video content;

3) A targeted subset of image and video content can NOT be collected in real-time from the

aircraft because communications links are NOT robust enough to handle some but not all of the

higher bandwidth demands of image or video content;

In approach 1 the communications links are robust enough so that image and video content can be

transferred from the vehicle to the ground-station in near-real time. Server infrastructure at the

ground-station can process incoming video so that alerts for targeted information of interest can be

processed and real actionable intelligence generated from video and image content during the mission.

In approach 2 the communications links are not robust enough to handle the immediate download of all

image and/or video content to the ground station. However, in aircraft that are large enough to handle

a payload of about 7lbs, a server the size of a laptop can run the piXserve software in order to allow

important targeted image or video content to be downloaded to the ground station for immediate

action.

In approach 3 the communications links are not robust enough to handle the immediate download of

any image and/or video content to the ground station. However, in aircraft that are large enough to

handle a payload of about 7lbs, a server the size of a laptop can run the piXserve software in order to

filter results, tag the image and video and allow a text messages about a specific object of interest at a

specific geo-coordinate be sent to the ground station so that immediate action can be taken as

demonstrated in Figure 9.

UAS Ground Computing Systems Actionable Intel

Figure 9 - UAS Enabled to Give Real-time Actionable Intel from ISR Data

In all of the aforementioned approaches, piXserve has the ability to manage and change specific target

objects on the fly even over very low bandwidth links so that image and/or video content alerting can be

changed dynamically as needed. That is, new objects of interest can be uploaded to the server for

automatic tagging during the actual mission timespan.

Concept of Operations for UAS Image/Video Alerting In a typical scenario where UAS ISR is the prime mission objective and piXserve is being used to

automatically identify priority intelligence requirements (PIRs) then piXserve can be deployed on larger

8

UAVs via a very low power, light-weight, compact micro-server stack that will allow image and video

processing on the aircraft itself as shown in Figure 10.

Prior to mission initiation, a list of PIRs would be compiled and source images matching the items of

interest would be gathered by the mission planners. These source images, which piXlogic calls 2D non-

transformable objects (i.e. objects that have a fixed 2D shape), would have their file names changed to

whatever taxonomic label the government wanted to tag incoming images or video with and placed into

the mission’s library folder on the server residing on the UAV.

Figure 10 – Micro-server Running piXserve Aboard UAV

During operations, as video was collected from the UAS ISR imaging and video sensors, piXserve would

dynamically match the 2D non-transformable objects (i.e. the list of image files of things we want to

find) in the mission library and tag the image and video content appropriately. Alerts would be

automatically sent to the ground station immediately indicating to ground personnel that a PIR of

interest was just identified by the UAS and that some action could be taken as shown in Figure 11.

Figure 11 – UAS – Sending Alerts from piXlogic During ISR Mission Operations

UAS Bandwidth Considerations for ISR Video and Imagery In some low-bandwidth situations where the UAS doesn’t have sufficient bandwidth to “normally”

transfer actionable video or imagery to the ground station another technology that Flex Analytics has

available is a product called FLUME. FLUME, by Saratoga Data Systems Inc., is a file transfer protocol

that allows data to be transferred over poor quality low bandwidth links at very high speeds with high

reliability.

9

In a recent demonstration for Special Operations Command (SOCOMM) Technical Network Test-bed

(TNT) at the Army Urban Warfare Training Center in Muscatatuck, Indiana, Flex Analytics and its

partners including Cloud Front Group, Saratoga Data Systems Inc. and piXlogic demonstrated using

piXlogic in combination with FLUME as part of the overall UAS package. The goal was to emulate

piXserve and FLUME running on the UAV and transferring video clips of PIRs of interest to the ground

station over poor quality communications links so that commanders on the ground could take

advantage of immediately actionable intelligence as shown below in Figure 12.

Figure 12 - TNT CONPS, Demonstration for SOCOMM - piXserve and FLUME as part of a UAS

During the test, piXlogic successfully alerted on pre-mission identified PIRs which resulted in small 6Mb

to 8Mb clips of the objects of interest being transferred to the ground station over 2MB/sec links with a

relatively high error rates. Ordinarily, using standard TCP/IP File Transfer Protocols (FTP or SCP6) video

clips of 6Mb to 8Mb in size would take about 7.5 minutes over a typical 2Mb/sec error prone link.

However, in combination with FLUME the 6Mb to 8MB video clip data was transferred in under a

minute.

As a result, the test for SOCOMM showing piXserve and FLUME as part of a UAS system was highly

successful. It clearly demonstrated that piXserve and FLUME could:

1. Identify PIRs of interest accurately

2. Transfer targeted video clips rather than the entire video stream to the ground

3. Provide fast and accurate data transfer rates over low bandwidth error prone comms links

4. Allow commanders on the ground to receive accurate, timely information about multiple PIRs

and thereby providing immediately actionable intelligence

Overall piXserve and FLUME met and exceed all mission objectives.

Summary Today, the means exists for the UAS customers to acquire software technology that would allow them to

automatically index and make searchable and analyzable all UAS collected image and video content.

The means exists immediately to employ that technology at the edge of the collection envelope where

6 Another FTP services is the FTP over Secured Communications Protocol (SCP)

10

the UAV aircraft are deploying their video and image sensors on ISR duties in combat, law-enforcement,

rescue, intelligence and other settings.

Government and civilian UAS customers are collecting Pedabytes of video and image content (its REALLY

BIG DATA). In the past they didn’t have the means to reasonably, cost-effectively and reliably exploit

that information. However, technology immediately available from Flex Analytics and its partner piXlogic

can change that situation. piXserve, the image and video search and analytics software can be used to

help the government find mission objects of interest resulting in immediately actionable intelligence –

right on the aircraft and/or at the ground station.

real time uas video analytics

Technology

isr video

uas isr uav

image material

uas isr background

video technology solutions

offer automated image

theoverall image

pixlogics image