real time uas video analytics
TRANSCRIPT
DRAFT Real-time UAS Video Analytics
In Mission Exploitation of ISR Image and Full Motion Video Content
Gregory B. Pepus
Tuesday, February 22, 2012
Version 1.2
Globally, public sector customers are spending $6.5 billion dollars or more annually to deploy Unmanned Aerial Systems (UAS)
which are a combination of Unmanned Aerial Vehicles (UAVs) and associated ground station technology and personnel. Flex
Analytics and its partner piXlogic offer the means to automatically exploit UAV (and other) collected image and full motion
video in real-time both during and after the mission to produce actionable intelligence.
i
Synopsis ........................................................................................................................................................ 1
UAS ISR Background .................................................................................................................................. 1
piXlogic’s Image and Video Analytics ........................................................................................................ 2
Other Capabilities ..................................................................................................................................... 4
Classification, Matching and Tagging ........................................................................................................ 4
Text in the Image ...................................................................................................................................... 6
piXserve for UAS and Manned Aircraft ISR Missions ................................................................................ 6
Concept of Operations for UAS Image/Video Alerting ............................................................................. 7
UAS Bandwidth Considerations for ISR Video and Imagery ..................................................................... 8
Summary ................................................................................................................................................... 9
1
Synopsis On a global basis, military, police, emergency and other public service providers are spending in excess
of $6.5 billion dollars or more annually to deploy a range of Unmanned Aerial Systems (UAS) which are a
combination of Unmanned Aerial Vehicles (UAVs) and associated ground station communications,
computer technology and personnel. The single most important use for UAS is the Intelligence,
Surveillance, and Reconnaissance (ISR) mission.
UAS ISR Background The principle means by which ISR is carried out via UAS is through Full Motion Video (FMV) and/or still
image collection as seen in Figure 1. In practice the ISR function is used for immediate mission support;
unfortunately, most of the collected surveillance material (over 99%) remains unexploited and
unexploitable in any large scale automated sense and this is a global problem.
Figure 1 – UAS ISR – UAV Based Image and Full Motion Video Sensors
Collected image and video content is most certainly NOT used in any great scale for real-time generation
of in-mission actionable intelligence except for the” immediate” limited surveillance needs of the
current mission while flying the aircraft. This is because the UAS have not, as a rule, included the
analytical technology, either in the bird or on the ground, necessary to support automated near real-
time exploitation of the incredible volume of video and image material being collected.
Globally customers are buying the air platforms, cameras and storage facilities to generate, view and
store this information; but have, in practice, spent almost nothing on automated exploitation of that
information particularly in near real-time during actual mission operations. The limited amount of video
and image exploitation that is done is carried out by people in an attempt to analyze less than 1% of the
overall image and full motion video content that’s collected via UAVs as shown in Figure 2. The principal
problem is that human’s only have limited ability to actively analyze video before there are significant
decreases in concentration, object recognition and cognitive performance.
2
Figure 2 –UAS Ground Control Systems
Why don’t customers usually consider automated image and video analytical systems? Most don’t know
the art of the possible and those that have experience with related technologies have found these
systems to be highly inaccurate producing unreliable analytical results. However, the need is real and
ever growing. It is compounded because most UAS have multiple cameras and individual humans cannot
actively split their focus in a detailed way on multiple video feeds and effectively capture actionable
intelligence from those feeds for a sustainable period.
Fortunately, the state of the art for automated exploitation and analysis for image and video content is
now highly reliable and the technology is most definitely ready for primetime. Flex Analytics LLC and its
partner piXlogic, Inc. offer automated image and video technology solutions that are immediately
available to help the government analyze image and video content in order to produce actionable
intelligence during and after UAS mission operations. These solutions allow “accurate” automated
targeting and tagging of mission specific objects of interest - leading to immediate actionable
intelligence.
Additionally, state-of-the-art analytic solutions for ISR FMV and image allow customers to dramatically
reduce the amount of stored content because analytics would identify the immediate images and/or
video content that contained important material and record ONLY that information in a searchable
index. This approach could lead to a 25% to 50% or higher reduction in the amount of video and/or
image content retained – immediately saving customers millions of dollars in storage costs and allowing
a complete and immediate exploitation of collected content.
piXlogic’s Image and Video Analytics piXlogic Inc. offers piXserve an image and video search and analytics platform that automatically indexes
the objects within an image or frame of video. Think “Google for image and video” where no manual
intervention is required to automatically tag objects of interest for immediate actionable alerting, search
3
and analysis. Using image processing search and analysis algorithms that work similarly to human
vision, piXserve is able to identify particular specific objects at the pixel level in image and video content.
For example, a user can provide an image of a minivan and the software will match it to all occurrences
of the minivan found in the current corpus of image or video content as shown in Figure 3.
Figure 3 – Searching for an Object (e.g. Minivan) And Getting Results Back
piXserve identifies and then describes in an index the object’s shape and immediately attempts to
classify that shape and also to match it with specific library of objects of interest to get not only a
classification but also a specific match. When the software identifies an object it creates a mathematical
geometric description of that object recognizing all sorts of details such as edges, texture, color, lighting
and many other factors and those facets of information are inserted into the index and made
searchable.
When it is able to classify an object or match it to a specific library item it then tags1 the object in the
search index. Once indexed, piXserve allows image and video content to be “searched” using either an
image of the object of interest (or something similar), or using a keyword describing that object or by
using other search criteria to be described later in this document. Figure 4 shows how the software
might be used to find a ship of interest by searching using parts of that ship which lead to a match. For
example, the software might identify key markings, flags2, text on the object or other particular features
and apply metadata to that image or video which is stored in the searchable index.
1 Tags refer to the process of automatically, without human intervention, applying labels to objects in image or
video content. 2 The flags are too small in this image for the software to accurately identify
4
Metadata
Flag
Flag
Mark
Ship Line
Ship Name
Figure 4 – Finding an Object
Other Capabilities The software also has the capability to do “facial biometrics” to National Institute of Standards and
Technology (NIST) specification. NIST has a full facial biometrics program and a range of data and
testing material to support biometric facial recognition software capabilities. NIST has had several
industry competitions and piXserve has developed its facial biometrics based on the data and
requirements put forward NIST’s Facial Recognition Grand Challenge (FRGC)3.
Additionally, the software has the ability to recognize “text” in image or video. It is able to identify not
only the language but also the “text string” on an image. This capability is not subject to the technical
limitations of optical character recognition (OCR) because it uses different technology. That is, it uses
piXserve’s ability to identify and recognize specific shapes applied to text in more than 28 languages
including “Chinese, Korean, Russian and a range of Western languages”.
Classification, Matching and Tagging As mentioned above the software has the ability to generally classify an object using a concept piXlogic
calls a notion4. This capability allows piXserve to have a notion for and identify the broad class of things
to which an object it has identified belongs. For example, piXserve is able to classify things like, sky, sea,
tree, person, building, window and many other classes of objects.
3 http://www.nist.gov/itl/iad/ig/frgc.cfm
4 Notions are essentially an “ontology” broadly identifying a class to which an object belongs – i.e. that green thing
is a tree, the blue stuff is sea or sky etc.
5
Tree Person/Face Sea/Sky Building Window Figure 5 – Notions for Identified Objects of Interest
When the software recognizes a specific class of object it is able to “tag” that object with a keyword. So
as shown in Figure 5 the software would try to tag pictures of trees, person, face, sea, sky, building,
window accordingly.
However, the software can go beyond that. For example, if you were trying to find a specific vehicle in a
UAV video, one could develop a library5 containing a couple of example pictures of that vehicle from
different angles. The software would use those examples to specifically match all instances of that
vehicle in that video and tag that vehicle according to the library images filename. In this case the
examples of the vehicle might have the filenames Minivan,1, Minivan,2, Minivan, 3 etc.
Minivan, 1 Minivan, 2 Minivan, 3
Figure 6 – Small Library of a Minivan
For example, as shown in Figure 6, we could create a library of a few images of a minivan as indicated
above then when the video of interest is processed (indexed) by piXserve it attempts to label all
instances of the minivan that occur in that video as Minivan. Then, instead of using a picture of a
minivan to search with, users would simply use piXserve’s keyword search function typing in the word
“Minivan” and getting a list of all search results back as shown below in Figure 7.
Furthermore, it will tag all occurrences of that person in incoming new images or video and facial
recognition in this case approaches 98% accuracy on average based on NIST standards.
5 Library in piXlogic parlance is a folder with images to be matched whereby the filenames are used to provide
tagging of matches in the incoming corpus of image or video content.
6
Figure 7 – Finding the Minivan Using Keywords/Results Shown to Right
Text in the Image piXserve, as mentioned above, has the ability to recognize text in the image. Unlike OCR technology
which is highly sensitive to page color, font color and font type, piXserve can identify text in image under
almost any condition of foreground or background color or font. When an image or video is indexed
with the alphanumeric option set, users can search an index of image or video content using the text in
image option. This allows video to be searched by its banners (e.g. a news broadcast), or text on any
object such as a street signs or building names or characters on a t-shirt, or license tags, or writing on a
ship, truck, train or plane etc. as shown in Figure 8.
Pacific Line Maryland Free Meals United Way … China Shipping … N65482
Figure 8 - Finding Text on the Image
piXserve for UAS and Manned Aircraft ISR Missions Implementing a piXserve solution for UAS and manned aircraft ISR missions makes a tremendous
amount of sense. piXserve can be used in different contexts to help produce actionable intelligence
either during or after the mission based on several different approaches used during the UAS and / or
manned ISR missions. These approaches include:
7
1) Image and video is collected in real-time from the aircraft because communications links are
robust enough to handle the higher bandwidth demands of image or video content;
2) A targeted subset of image and video content can be collected in real-time from the aircraft
because communications links are robust enough to handle some but NOT all of the higher
bandwidth demands of image or video content;
3) A targeted subset of image and video content can NOT be collected in real-time from the
aircraft because communications links are NOT robust enough to handle some but not all of the
higher bandwidth demands of image or video content;
In approach 1 the communications links are robust enough so that image and video content can be
transferred from the vehicle to the ground-station in near-real time. Server infrastructure at the
ground-station can process incoming video so that alerts for targeted information of interest can be
processed and real actionable intelligence generated from video and image content during the mission.
In approach 2 the communications links are not robust enough to handle the immediate download of all
image and/or video content to the ground station. However, in aircraft that are large enough to handle
a payload of about 7lbs, a server the size of a laptop can run the piXserve software in order to allow
important targeted image or video content to be downloaded to the ground station for immediate
action.
In approach 3 the communications links are not robust enough to handle the immediate download of
any image and/or video content to the ground station. However, in aircraft that are large enough to
handle a payload of about 7lbs, a server the size of a laptop can run the piXserve software in order to
filter results, tag the image and video and allow a text messages about a specific object of interest at a
specific geo-coordinate be sent to the ground station so that immediate action can be taken as
demonstrated in Figure 9.
UAS Ground Computing Systems Actionable Intel
Figure 9 - UAS Enabled to Give Real-time Actionable Intel from ISR Data
In all of the aforementioned approaches, piXserve has the ability to manage and change specific target
objects on the fly even over very low bandwidth links so that image and/or video content alerting can be
changed dynamically as needed. That is, new objects of interest can be uploaded to the server for
automatic tagging during the actual mission timespan.
Concept of Operations for UAS Image/Video Alerting In a typical scenario where UAS ISR is the prime mission objective and piXserve is being used to
automatically identify priority intelligence requirements (PIRs) then piXserve can be deployed on larger
8
UAVs via a very low power, light-weight, compact micro-server stack that will allow image and video
processing on the aircraft itself as shown in Figure 10.
Prior to mission initiation, a list of PIRs would be compiled and source images matching the items of
interest would be gathered by the mission planners. These source images, which piXlogic calls 2D non-
transformable objects (i.e. objects that have a fixed 2D shape), would have their file names changed to
whatever taxonomic label the government wanted to tag incoming images or video with and placed into
the mission’s library folder on the server residing on the UAV.
Figure 10 – Micro-server Running piXserve Aboard UAV
During operations, as video was collected from the UAS ISR imaging and video sensors, piXserve would
dynamically match the 2D non-transformable objects (i.e. the list of image files of things we want to
find) in the mission library and tag the image and video content appropriately. Alerts would be
automatically sent to the ground station immediately indicating to ground personnel that a PIR of
interest was just identified by the UAS and that some action could be taken as shown in Figure 11.
Figure 11 – UAS – Sending Alerts from piXlogic During ISR Mission Operations
UAS Bandwidth Considerations for ISR Video and Imagery In some low-bandwidth situations where the UAS doesn’t have sufficient bandwidth to “normally”
transfer actionable video or imagery to the ground station another technology that Flex Analytics has
available is a product called FLUME. FLUME, by Saratoga Data Systems Inc., is a file transfer protocol
that allows data to be transferred over poor quality low bandwidth links at very high speeds with high
reliability.
9
In a recent demonstration for Special Operations Command (SOCOMM) Technical Network Test-bed
(TNT) at the Army Urban Warfare Training Center in Muscatatuck, Indiana, Flex Analytics and its
partners including Cloud Front Group, Saratoga Data Systems Inc. and piXlogic demonstrated using
piXlogic in combination with FLUME as part of the overall UAS package. The goal was to emulate
piXserve and FLUME running on the UAV and transferring video clips of PIRs of interest to the ground
station over poor quality communications links so that commanders on the ground could take
advantage of immediately actionable intelligence as shown below in Figure 12.
Figure 12 - TNT CONPS, Demonstration for SOCOMM - piXserve and FLUME as part of a UAS
During the test, piXlogic successfully alerted on pre-mission identified PIRs which resulted in small 6Mb
to 8Mb clips of the objects of interest being transferred to the ground station over 2MB/sec links with a
relatively high error rates. Ordinarily, using standard TCP/IP File Transfer Protocols (FTP or SCP6) video
clips of 6Mb to 8Mb in size would take about 7.5 minutes over a typical 2Mb/sec error prone link.
However, in combination with FLUME the 6Mb to 8MB video clip data was transferred in under a
minute.
As a result, the test for SOCOMM showing piXserve and FLUME as part of a UAS system was highly
successful. It clearly demonstrated that piXserve and FLUME could:
1. Identify PIRs of interest accurately
2. Transfer targeted video clips rather than the entire video stream to the ground
3. Provide fast and accurate data transfer rates over low bandwidth error prone comms links
4. Allow commanders on the ground to receive accurate, timely information about multiple PIRs
and thereby providing immediately actionable intelligence
Overall piXserve and FLUME met and exceed all mission objectives.
Summary Today, the means exists for the UAS customers to acquire software technology that would allow them to
automatically index and make searchable and analyzable all UAS collected image and video content.
The means exists immediately to employ that technology at the edge of the collection envelope where
6 Another FTP services is the FTP over Secured Communications Protocol (SCP)
10
the UAV aircraft are deploying their video and image sensors on ISR duties in combat, law-enforcement,
rescue, intelligence and other settings.
Government and civilian UAS customers are collecting Pedabytes of video and image content (its REALLY
BIG DATA). In the past they didn’t have the means to reasonably, cost-effectively and reliably exploit
that information. However, technology immediately available from Flex Analytics and its partner piXlogic
can change that situation. piXserve, the image and video search and analytics software can be used to
help the government find mission objects of interest resulting in immediately actionable intelligence –
right on the aircraft and/or at the ground station.