image recognition whitepaper-low res - r systems...

SEP. 2017On an average, over 300 million images are uploaded on Facebook daily

Source: Zephoria

R Systems Inc.

W H I T E P A P E R

THOUGHT LEADERSHIP

Image Recognition Revolution & Applications

Uncover Valuable Insights from Image Conversations

Table of Contents

1. Introduction ............................................................................................................................... 1

1.1. Exponential Growth of Image and Video .............................................................. 1

1.2. Statistics ............................................................................................................................ 2

2. Image Recognition .................................................................................................................. 3

3. Recent Innovations .................................................................................................................. 4

3.1. Approaches ..................................................................................................................... 4

3.2. Deep Neural Networks ................................................................................................ 5

4. Applications ............................................................................................................................... 8

4.1. Information Organization ........................................................................................... 8

4.2. Industrial Automation and Inspection ................................................................... 9

4.3. Detecting events ........................................................................................................... 9

4.4. Human-Computer Interaction ............................................................................... 10

4.5. Modeling objects and environments ................................................................ 10

4.6. Navigation ..................................................................................................................... 10

4.7. Marketing, Sales, Customer Experience and Advertising.............................. 11

4.8. Weak AI vs. Strong AI................................................................................................... 11

Thought Leadership Whitepaper | analytics.rsystems.com

Data, in particular, unstructured data has been growing at a very fast pace since mid-2000’s. Eighty percent of all data generated is unstructured multimedia content which fails to get focus in organizations’ big data initiatives. A good portion of this multimedia content is images and videos1. Readily available smart wireless devices along with the rising popularity of sharing images and videos through the internet have contributed signi�cantly in the massive growth of this type of content. Images and videos now re�ect a good portion of human knowledge, interactions and conversations. Today, this immense knowledge of image & video data and increase in image sharing as the old saying “a picture is worth a thousand words”2 have sparked a signi�cant opportunity to create new use cases, applications and products. For decades, the processing, understanding and recognizing of images have been a big technical challenge in AI and Machine Learning (ML) and it still remains to be a challenge. However, in the last decade, there have been some breakthroughs.

The ease with which people now use their smartphone cameras to enrich their communication (for example email, chat, blog) with businesses (retailers, �nancial institutions, vendors, medical providers, insurance companies, etc.) has also brought acceptance for images and

1. Introduction1.1. Exponential Growth of Image and Video

1 of 13

1 Video is a continuum of images and though more challenging to store, process and understand, it builds on top of image recognition/ understanding techniques and capabilities.

2 "A picture is worth a thousand words" - but only when the story is best told visually rather than being written or told verbally and the picture is well thought and designed to portray what is to be conveyed.

videos in communication by companies in di�erent verticals and motivated them to invest in this area. Image and video require much greater storage and bandwidth capacity and heightened security/ privacy standards. For many of these applications, the automatic understanding of images/ videos will provide new business opportunities in terms of augmenting and enhancing customer experience. This complicates the common problems related to unstructured data growth such as rising data protection costs, increasing infrastructure complexity and data consumption growth being faster than IT storage footprint growth.

The fact that creating and sharing images is easier than ever before is not the only reason that image recognition is becoming popular. Images are more impactful than text as they are often more engaging. Images are more likely to be shared and reshared. People utilize images/ videos to capture their special moments. However, images have evolved to become a means to communicate. The preferred way of communication for “Gen Z” is thought to be images. In contrast, the preferred mechanism for communication for millennials has been text.

1.2. Statistics

By 2016 itself, YouTube had one billion daily mobile users where per minute video uploads to its site were around 300 hours Source: Static Brain

2 of 13Thought Leadership Whitepaper | analytics.rsystems.com

Let us look at some statistics which highlights the reasons for a great surge of interest in image recognition in recent years. Firstly, the image recognition market is estimated to expand from US$15.95 Billion in 2016 to US$38.92 Billion by 2021, at the CAGR of 19.5% between 2016 and 20213. Facebook is the largest image sharing site on Internet. Images represent the largest source of data usage on Facebook. On an average, more than 300 million images are

uploaded to its site daily . This volume may be shocking at �rst but taking into account the number of active users (2 billion monthly on 7/26/2017) who access Facebook site via their smartphones daily (1.15 billion daily mobile active users as of 2/1/2017) coupled with dependency of people on these devices for their camera, the numbers make sense. As of 9/1/2016, YouTube had one billion daily mobile users where the per minute video uploads to its site were 300 hours4.

# of

Pho

tos

Shar

ed P

er D

ay (M

M)

0

500

1000

1500

2000

2500

3000

3500

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Daily Number of Photos Shared on Select Platforms, Global, 2005 - 2015

SnapchatFacebook MessengerInstagramWhatsAppFacebook

Source: Snapchat, Company disclosed information, KPCB estimates Note: Snapchat data includes images and video. Snapchat stories are a compilation of images and videos. WhatApp data estimated is based on average number of photos shared disclosed in Q1:15 and Q1:16Instagram data per Instagram press release. Messenger data per Facebook (~9.5B photos per month). Facebook shares ~2B photos per day across Facebook, Instagram, Messenger and WhatsApp(2015)

KPCB INTERNET TRENDS 2016 | PAGE 90

Figure 1: Shows the exponential growth of photo sharing across some popular platforms from 2005 to 2015.

3 Zephoria Digital Marketing, “The Top 20 Valuable Facebook Statistics”, https://zephoria.com/top-15-valuable-facebook-sta-tistics/, Last accessed: 8/31/2017.

4 Static Brain, “YouTube Company Statistics”, http://www.statistic-brain.com/youtube-statistics/, Last accessed: 8/31//2017.

3 of 13

2. Image RecognitionThe objective of image recognition is to recognize and identify objects and people in images and understand the context. Image recognition falls under machine perception which is a part of machine learning (ML) and arti�cial intelligence (AI). Human beings have a multitude of senses. The �ve traditionally recognized senses are smell (olfaction),hearing (audition), taste (gustation),touch (somatosensation) and sight (vision). Building intelligent robots requires some ability to comprehend the surrounding environment and interact with it through vision, speech and touch in addition to some level of locomotion and reasoning in some form similar to the humans. For decades, industrial robots

ImageNet7 and Pascal VOC are the two large open labeled image data sets available for research and exploration purposes. ImageNet was sponsored and launched by the skilled computer scientists of Stanford and Princeton university in the year 2009 with 80,000 tagged images to start with. It is known for its annual visual recognition challenge (called ILSVRC) where the di�erent participants belonging to academic and industrial sphere gather to compete for best image recognition algorithm performance on ImageNet data. By 2016, it had grown to include more than 14 million tagged images, available anytime for machine learning purposes. Pascal VOC sponsored by various universities in the UK has fewer images, but richer image annotations.

The best performing deep neural nets for image recognition are referred to as “convolutional neural networks” (hereinafter CNNs). Compared to the traditional multi-layer ANNs, CNNs possess a few special properties8

enabling them to automatically learn the relevant features. Starting from original raw image, CNN applies a set of di�erent transformations or �lters9 on the image where with each transformation it learns a more compact representation of the image. At the end of the training exercise, CNN has learned a set of more abstract features to represent the images. These features are then used as inputs to a classi�cation algorithm, typically a fully connected ANN right before the output layer, to recognize the images. Figure 3 depicts a deep neural network along with multiple layers for face recognition. Each layer has learned a denser and more condensed presentation of input images.

Vide

o V

iew

s pe

r Day

(B)

0

2

4

6

8

10

Q3:14 Q4:14 Q1:15 Q2:15* Q3:15

Facebook Daily Video Views,Global, Q3:14 - Q3:15

Vide

o Vi

ews

per D

ay (B

)

0

2

4

6

8

10

Q4:14 Q1:15 Q2:15 Q3:15 Q4:15 Q1:16

Snapchat Daily Video Views,Global, Q4:14 - Q1:16

Source: Facebook, Snapchat. Q2:15 Facebook video views data based on KPCB estimate. Facebook video views represent any video shown onscreen for >3 seconds (including autoplay). Snapchat video views counted instantaneously on load.

KPCB INTERNET TRENDS 2016

have had very limited capabilities in some of these areas in the context of speci�c controlled automation tasks which they were to accomplish. For many industrial automation tasks, by leveraging other types of sensors (infrared, range sensors, magnetic, ultrasound, etc.) as proxies for vision, identi�cation and interaction with the objects of interest in a controlled environment was simpli�ed. With advances in image recognition, cameras could eventually replace many of these sensors for many automation application.

When we talk about human visual perception, we talk about the ability to easily interpret the surrounding environment using light in the visible

spectrum re�ected by the objects in the environment. The recent surge of interest in image recognition focuses on this type of sensory inputs. For example, driverless cars require signi�cantly improved visual processing and recognition capabilities of this kind in addition to many other critical sensory inputs to make right decisions.

Machine perception, in general, focuses on imitating what human brain can do e�ortlessly making sense of the sensory inputs, speci�cally vision, hearing and touch. The visual cortex of the brain is a part of the cerebral cortex that processes visual information from the eyes. Vision develops rapidly in early stages of life and serves as the base for development of cognition, action, communication and interactions with the environment. Our brains are wired for visual communication as we process visuals faster, we remember them longer and we respond to them stronger emotionally. Human process visual inputs 60,000x faster than text. Children at early stages of life can learn and recognize many di�erent objects visually. This ability of humans has puzzled scientists for a long time. Speci�cally, the challenge has been to devise the kinds of computational algorithms which would be required to replicate such ability in a machine? Surprisingly a little child knows nothing about the inner workings of say a car or its components but he/ she can e�ortlessly identify a car in a scene or an image anywhere he/ she sees it. How do children learn this? They learn through

examples. The concept of “learning by example” is fundamental to AI/ ML in general and machine perception in particular. Arti�cial neural networks5 (hereinafter ANN) are the most popular systems to imitate learning in a machine. ANNs can progressively improve the performance (i.e. they “learn”) through considering examples of what they need to learn (called training), mostly without task-speci�c programming. Inspired by neurons in the brain, ANN architectures are organized in many layers where each layer of neurons may perform di�erent kinds of transformations on its inputs.

Neural networks today can have a few thousand to a few million neurons with millions of connections10. CNNs can only deal with �xed-size input and output, which means they can learn �xed mappings with no notion of time. Another family of the ANNs referred to as Recurrent Neural Networks11 (RNNs) are suited for learning sequences of inputs. These have found various uses for image and video captioning in addition to the machine translation, natural language processing (NLP), sentiment analysis and speech recognition. At ImageNet Large Scale Visual Recognition Challenge (ILSVRC) competition in 2012, a group of

researchers produced a CNN model (referred to as AlexNet12) which visibly provided a signi�cantly superior performance with 85% accuracy on ImageNet (improved performance accuracy by 10.8 percentage point equivalent 41% error improvement rate) compared to the best existing model using the traditional approach. This was a turning point in the history of image recognition and beginning of a promising future in this �eld. This achievement shifted the focus away from the traditional image recognition

approach to the new approach with deep neural nets. At the ILSVRC competition in the year 2013, all the participants (including winners) had their solutions and algorithms based on deep learning based techniques. At ILSVRC competition in 2015, multiple CNN based algorithms surpassed the human recognition rate of 95% (5% error rate). In the year 2017, 29 out of 38 participants surpassed the human recognition rate of 95% with the best at 97.3% (See Figure 4).

Figure 2: Depicts the quarterly average daily video views growth across two popular platforms between 2014 and 2015.

4 of 13

The objective of image recognition is to recognize and identify objects and people in images and understand the context. Image recognition falls under machine perception which is a part of machine learning (ML) and arti�cial intelligence (AI). Human beings have a multitude of senses. The �ve traditionally recognized senses are smell (olfaction),hearing (audition), taste (gustation),touch (somatosensation) and sight (vision). Building intelligent robots requires some ability to comprehend the surrounding environment and interact with it through vision, speech and touch in addition to some level of locomotion and reasoning in some form similar to the humans. For decades, industrial robots

Thought Leadership Whitepaper | analytics.rsystems.com

5 Historically traditional ANNs have been used on a variety of tasks, including computer vision, intelligent character recognition, fraud detection, speech recognition, machine translation, social network �ltering, playing board, video games, medical diagnosis and many other domains.




have had very limited capabilities in some of these areas in the context of speci�c controlled automation tasks which they were to accomplish. For many industrial automation tasks, by leveraging other types of sensors (infrared, range sensors, magnetic, ultrasound, etc.) as proxies for vision, identi�cation and interaction with the objects of interest in a controlled environment was simpli�ed. With advances in image recognition, cameras could eventually replace many of these sensors for many automation application.

When we talk about human visual perception, we talk about the ability to easily interpret the surrounding environment using light in the visible

spectrum re�ected by the objects in the environment. The recent surge of interest in image recognition focuses on this type of sensory inputs. For example, driverless cars require signi�cantly improved visual processing and recognition capabilities of this kind in addition to many other critical sensory inputs to make right decisions.

Machine perception, in general, focuses on imitating what human brain can do e�ortlessly making sense of the sensory inputs, speci�cally vision, hearing and touch. The visual cortex of the brain is a part of the cerebral cortex that processes visual information from the eyes. Vision develops rapidly in early stages of life and serves as the base for development of cognition, action, communication and interactions with the environment. Our brains are wired for visual communication as we process visuals faster, we remember them longer and we respond to them stronger emotionally. Human process visual inputs 60,000x faster than text. Children at early stages of life can learn and recognize many di�erent objects visually. This ability of humans has puzzled scientists for a long time. Speci�cally, the challenge has been to devise the kinds of computational algorithms which would be required to replicate such ability in a machine? Surprisingly a little child knows nothing about the inner workings of say a car or its components but he/ she can e�ortlessly identify a car in a scene or an image anywhere he/ she sees it. How do children learn this? They learn through

examples. The concept of “learning by example” is fundamental to AI/ ML in general and machine perception in particular. Arti�cial neural networks5 (hereinafter ANN) are the most popular systems to imitate learning in a machine. ANNs can progressively improve the performance (i.e. they “learn”) through considering examples of what they need to learn (called training), mostly without task-speci�c programming. Inspired by neurons in the brain, ANN architectures are organized in many layers where each layer of neurons may perform di�erent kinds of transformations on its inputs.

3. Recent Innovations3.1. Approaches

Image recognition has a long history. There are related and/ or synonymous �elds with image recognition under di�erent names such as computer vision, object recognition, machine vision, scene understanding, image understanding, image classi�cation and image analysis. Computer (or machine) vision in general covers recognition as a subpart while it is also concerned with image reorganization and reconstruction. At a higher level, there are two di�erent technical approaches which are able to solve image recognition tasks.

The focus of the �rst approach, we call it traditional image recognition, is on finding and extracting human engineered features (such as edges, corners, color) from images to help classify objects. While the human brain is exceedingly good at classifying objects (developed in early years of life), it is not clear which features our brain would leverage for visual processing. Since the 80’s and 90’s, traditional image recognition approaches in general work by extracting a family of features from images, practically hand coded through years of experiment and analysis. A learning algorithm is then used to recognize the objects in the images based on these human engineered features.

In the second approach, the goal is still the same which is to extract features that help identify objects in an image. However, instead of human engineered features, it leverages an automated procedure to “learn” these salient features from raw image pixel data. The learning takes place using a very large number of images. ANN models, especially deep neural networks, have revolutionized this approach in the recent years. Deep neural nets as the name implies are ANNs that potentially have many more layers of neurons in which each layer of neurons is connected to the next (not necessarily fully connected) and capable of learning a higher-level representations (features) of input images. This idea has been around for a long time, however, availability of huge image datasets6 and enormous processing power was realized in the last decade. It has made this approach feasible to the point that it has created a revolution in computer vision. When deep

neural nets are used, the learning is referred to as Deep Learning.




5 of 13




Image recognition has a long history. There are related and/ or synonymous �elds with image recognition under di�erent names such as computer vision, object recognition, machine vision, scene understanding, image understanding, image classi�cation and image analysis. Computer (or machine) vision in general covers recognition as a subpart while it is also concerned with image reorganization and reconstruction. At a higher level, there are two di�erent technical approaches which are able to solve image recognition tasks.

The focus of the �rst approach, we call it traditional image recognition, is on finding and extracting human engineered features (such as edges, corners, color) from images to help classify objects. While the human brain is exceedingly good at classifying objects (developed in early years of life), it is not clear which features our brain would leverage for visual processing. Since the 80’s and 90’s, traditional image recognition approaches in general work by extracting a family of features from images, practically hand coded through years of experiment and analysis. A learning algorithm is then used to recognize the objects in the images based on these human engineered features.

In the second approach, the goal is still the same which is to extract features that help identify objects in an image. However, instead of human engineered features, it leverages an automated procedure to “learn” these salient features from raw image pixel data. The learning takes place using a very large number of images. ANN models, especially deep neural networks, have revolutionized this approach in the recent years. Deep neural nets as the name implies are ANNs that potentially have many more layers of neurons in which each layer of neurons is connected to the next (not necessarily fully connected) and capable of learning a higher-level representations (features) of input images. This idea has been around for a long time, however, availability of huge image datasets6 and enormous processing power was realized in the last decade. It has made this approach feasible to the point that it has created a revolution in computer vision. When deep

neural nets are used, the learning is referred to as Deep Learning.

3.2. Deep Neural Networks

6 A simple algorithm leveraging large number of training data can perform better than a fancy algorithm using a small number of training data. The di�erence between the two approaches highlights this shift in thinking and the higher importance of data.7 Quartz Media, The data that transformed AI research—and possibly the world, https://qz.com/1034972/the-data-that-changed-the-direc-tion-of-ai-research-and-possibly-the-world/, Last accessed 8/31/20178 CNNs can have many layers each specialized at a particular task. At a high-level, these tasks can be categorized as “feature detectors”, “dimensionality reduction” and “classi�cation.”





11 Andrej Karpathy, The Unreasonable E�ectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-e�ec-tiveness/, Last accessed 8/31/2017.





9 There is a large body of knowledge in image processing and signal processing �elds about the variety of these transformations/ �lters. 10 We can think of them as similar to a worm brain in terms of the number of neurons and connections compared to a human brain. Although, they can already do much more interesting things.



Figure 3: This picture conceptually illustrates a deep neural net with �ve layers of neurons trained for face recognition. The middle layers of neurons have progressively learned denser representations or features (edges by the second layer and full faces by the last internal layer) from raw input images.

input layer

output layer



The technical advancements in image recognition in recent years have created massive new business opportunities a�ecting many business verticals from automotive to advertising. In particular, it has fueled an online visual revolution. There are legacy use cases that will also bene�t from these advancements. Here we list a number of these applications. This list is not comprehensive.


Figure 4: ImageNet Large Scale Visual Recognition Challenge results from 2010 to 2017 (the �nal year of the competition).

7 of 13




12 AlexNet was designed by the Super Vision group, consisting of Alex Krizhevsky, Geo�rey Hinton and Ilya Sutskever.

In the competition’s �rst yearteams had varying success,Every team got at least 25% wrong.

In 2012, the teams to �rst usedeep learning was the only team to get their error rate below 25%.

The Following Yearnearly every team got 25% or fewer wrong.

In 2017, 29 of 38 teams got less than5% wrong.

ImageNet Large Scale Visual Recognition Challenge results

‘10

25

50

75

100%Wrong

Perfect‘11 ‘12 ‘13 ‘14 ‘15 ‘16 ‘17




The technical advancements in image recognition in recent years have created massive new business opportunities a�ecting many business verticals from automotive to advertising. In particular, it has fueled an online visual revolution. There are legacy use cases that will also bene�t from these advancements. Here we list a number of these applications. This list is not comprehensive.

An obvious new application of image recognition resulting from the recent high accuracy rate achievements is automatic tagging/ labeling of the images based on content for indexing the databases of images and image sequences. The automatic indexing results in a much larger set of images to be available for search. Using a few descriptive words, one can easily locate and choose images of interest (image search). Websites with massive visual databases like stock photography and video websites are the noteworthy bene�ciaries. A useful variation of this is “search by image” in which one submits an example image to an App or online site where similar looking images are returned for selection (for example Google search by image).

With enormous popularity of mobile devices and media cloud services, there is an unprecedented growth in personal photo collections. One of the popular use cases is managing and organizing





4. Applications

4.1. Information Organization

the increasing number of personal photos with the help of automatic tagging software. Image recognition techniques are also used to identify multiple elements in an image like objects, activities, logos, background scenes, etc. (Figure 5). This provides an intelligent way for automatic image captioning. The automatic image captioning by itself opens the door for a slew of new business use cases.

Identifying faces in a scene (face recognition) is something humans can easily do, but training computers to do the same has been challenging. There have been various breakthroughs in this �eld in recent years and face recognition is now a part of many applications (mobile and online), e.g. tag suggestions based on face recognition in Facebook.

Figure 5: Image recognition is used to automatically caption images or frames of videos.

Human process visual inputs 60,000x faster than text

9 of 13

01

03

05

07



For three decades, industrial processes have bene�ted from some limited forms of image recognition and typically in a controlled environment. Automobile manufacturing and automatic electronic assembly (for printed circuit boards) are the two notable examples. One general but popular application is industrial quality control where image recognition is used to automatically inspect �nal products or parts of products for defects. Another example is the identi�cation of position and orientation of objects to be picked up by industrial robots during assembly processes. Optical (digital) sorting is another popular application where image recognition has been used to separate produce (for example fruits) of di�erent grades and removing foreign material/ defects from production lines. Image recognition has many use case in the agricultural industry such as automatic irrigation, pest control, autonomous selective harvesting of crops and crop health. The recent advances in image recognition will drastically a�ect all business uses.





4.2. Industrial Automation and Inspection

4.3. Detecting events

Image recognition has many applications in visual surveillance and security. E�cient processing of video images provides a wealth of information to identify and categorize events of interest. In the future, image recognition enabled cameras (intelligent cameras) can replace many

sensor types. For example, intelligent cameras can replace infra-red sensors for motion detection and magnetic sensors for door close/ open status monitoring. For some critical Internet of Things (IoT) applications, image recognition enabled cameras will only need to communicate interesting events extracted from video and communicate that to the central server (or the cloud). The intelligent cameras can submit full video captured only for a preset period of time before and after the event as evidence instead of continuous streaming of video. This will make e�cient use of storage and bandwidth. Many image recognition capabilities can be embedded in cameras as the algorithms get more e�cient and processing power increases.

Image recognition techniques can be used to count objects such as cars or people in images. This capability can be used in tra�c management and sizing crowds. Such information is of great value to detect relevant events such as tra�c jam or number of people inside/ outside a speci�c location (e.g. a store).

Face recognition is instrumental in security and surveillance applications from identifying bad actors to providing access/ permissions. As another example, face recognition is to identify loyal and valuable customers when they enter a store to alert the store sta� to attend to them with special service.

13 Bernard Mar, “How AI And Deep Learning Are Now Used To Diagnose Cancer”, Forbes.com, May 16, 2017.







sensor types. For example, intelligent cameras can replace infra-red sensors for motion detection and magnetic sensors for door close/ open status monitoring. For some critical Internet of Things (IoT) applications, image recognition enabled cameras will only need to communicate interesting events extracted from video and communicate that to the central server (or the cloud). The intelligent cameras can submit full video captured only for a preset period of time before and after the event as evidence instead of continuous streaming of video. This will make e�cient use of storage and bandwidth. Many image recognition capabilities can be embedded in cameras as the algorithms get more e�cient and processing power increases.

Image recognition techniques can be used to count objects such as cars or people in images. This capability can be used in tra�c management and sizing crowds. Such information is of great value to detect relevant events such as tra�c jam or number of people inside/ outside a speci�c location (e.g. a store).

Face recognition is instrumental in security and surveillance applications from identifying bad actors to providing access/ permissions. As another example, face recognition is to identify loyal and valuable customers when they enter a store to alert the store sta� to attend to them with special service.


One of the most prominent applications of image recognition will be in the health industry for medical and biomedical image analysis. Traditionally, diagnosis of disease such as cancer and heart disease relied on examinations of X-rays and scans to spot early warning signs of developing such problems13. Image recognition will not only be able to assist physicians in these instances but could also surpass any single physician diagnosis given a large diverse set of examples to train on. Google’s image search and Facebook’s facial recognition may seem as simpler use cases compared to this. In some countries like China, lung cancer claims over several thousand lives each year, mainly due to signi�cant air pollution. With a shortage of quali�ed specialists, there is a huge burden on doctors who have to examine a lot more

4.5. Modeling objects and environments

4.4. Human-Computer Interaction scans than they can handle, resulting in mistakes and misdiagnoses.

Image recognition based systems have been widely used in astronomy and applications built for outer space for classi�cation of stars and galaxies in captured astronomical images. Aerial photography is another bene�ciary of the recent image recognition advancements for military, weather, research and commercial purposes.

Improving human-computer interaction has always been of great interest to researchers and software companies. Combined with speech recognition and NLP breakthroughs (Alexa, Cortana, Siri, etc.) in recent years (with RNNs), image recognition (to be speci�c gesture recognition, face recognition and eye/ head/ hand tracking) can reinvent the way we communicate with our computers, devices, cars and appliances. Image recognition combined with advances in virtual and augmented reality will continue to revolutionize the gaming industry too.

4.6. Navigation

Autonomous vehicles and mobile robots have always been of great interest to the military. Intelligent mobile robots equipped with advanced image recognition capabilities have many commercial (e.g. service industry) and personal uses. The most known recent application of advanced image recognition is in self-driving cars and automotive driver assistance. An autonomous vehicle relies on tens of algorithms processing data from a variety of sensors and cameras to make sense of its surroundings to navigate. Recent advances in image recognition have revolutionized this �eld such that it could be a true possibility within the next decade.

11 of 13

The unstructured text and image conversations are not simply changing the way we communicate with each other, but they are also changing the way brands/ vendors communicate with consumers. When it comes to uncovering valuable insights from the conversations and targeting the right audience, text analysis is only half of the story. To gain an understanding of content of images/ videos in the context of the text and other information (e.g. location) is the other half of the story and can bring great business value in customer service and sales/ marketing. Extracting salient insights from images, for example, logos of brands, from socially shared pictures is important for advertising and marketing/ sales. People can take a picture of a product of interest (for example medicine, shoe, electronic equipment, etc.) and submit it as a picture to an ecommerce site for ordering or price check. Image recognition capabilities to understand any logos, printed text on the product and category of product, could ideally �nd the exact product match or at least a close match. This makes it possible to order a product, check its prices, learn more about it and/ or read its reviews simply by an image taken on a mobile device and at the moment the person is most interested.

4.7. Marketing, Sales, Customer Experience and Advertising There are many other applications

that directly bene�t from recent progress in the image recognition including systems that can help the visually impaired to ones streamlining waste haulage. With signi�cant recent advances in both speech and image recognition and AI in general, we can build software and machines that are much smarter. However, they are far away from what we refer to as human intelligence. The current AI is referred to as narrow AI or weak AI. Narrow AI is only form of Arti�cial Intelligence which we have achieved so far. By de�nition, the narrow AI is good at performing a single task, such as playing chess or “Go”, recommending products to buy, making predictions (fraud, sales, etc.) and giving weather forecasts. Image recognition, speech recognition, self-driving cars (a good combination of a few narrow AIs), translation systems and the natural language processing are still narrow AI. The recent advances in speech and image recognition are narrow AI, even if they seem like breakthroughs. Human-level AI or strong AI is the type of AI which mimics advanced human understanding and reasoning. Strong AI is and has always been elusive.

4.8. Weak AI vs. Strong AI


REFERENCES:

The image recognition market is estimated to grow from US$15.95 Billion in 2016 to US$38.92 Billion by 2021, at the CAGR of 19.5% between 2016 and 2021

Source: Zephoria

There are many other applications that directly bene�t from recent progress in the image recognition including systems that can help the visually impaired to ones streamlining waste haulage. With signi�cant recent advances in both speech and image recognition and AI in general, we can build software and machines that are much smarter. However, they are far away from what we refer to as human intelligence. The current AI is referred to as narrow AI or weak AI. Narrow AI is only form of Arti�cial Intelligence which we have achieved so far. By de�nition, the narrow AI is good at performing a single task, such as playing chess or “Go”, recommending products to buy, making predictions (fraud, sales, etc.) and giving weather forecasts. Image recognition, speech recognition, self-driving cars (a good combination of a few narrow AIs), translation systems and the natural language processing are still narrow AI. The recent advances in speech and image recognition are narrow AI, even if they seem like breakthroughs. Human-level AI or strong AI is the type of AI which mimics advanced human understanding and reasoning. Strong AI is and has always been elusive.

[1] Zephoria Digital Marketing, “The Top 20 Valuable Facebook Statistics”, https://zephoria.com/top-15-valuable-facebook-statistics/, Last accessed: 8/31/2017.

[2] Static Brain, “YouTube Company Statistics”, http://www.statistic-brain.com/youtube-statistics/, Last accessed: 8/31//2017

[3] Quartz Media, The data that transformed AI research—and possibly the world, https://qz.com/1034972/the-data-that-changed-the-direc-tion-of-ai-research-and-possibly-the-world/, Last accessed 8/31/2017

[4] Andrej Karpathy, The Unreasonable E�ectiveness of Recurrent Neural Networks, http://karpathy.github.io/2015/05/21/rnn-e�ectiveness/, Last accessed 8/31/2017.

[5] Bernard Mar, “How AI And Deep Learning Are Now Used To Diagnose Cancer”, Forbes.com, May 16, 2017

Executive Contacts:


ABOUT R SYSTEMS

R Systems Analytics helps clients uncover actionable insights to drive competitive advantage and capture business value. We help organizations integrate and operationalize data analytics solutions, enabling them to gain visibility into previously opaque or hard to measure processes. This empowers our clients to make smarter business decisions.

Our team of data experts, consultants and data scientists leverage proven analytics methodologies, tools and best practices to de�ne the right analytics solutions for you, that solve complex business challenges/ speci�c use cases and drive future growth.

© 2016 R Systems International Limited. All Rights Reserved. All content/information present here is the exclusive property of R Systems International Ltd. The content/information contained here is correct at the time of publishing. No material from here may be copied, modi�ed, reproduced, republished, uploaded, transmitted, posted or distributed in any form without prior written permission from R Systems International Ltd. Unauthorized use of the content/ information appearing here may violate copyright, trademark and other applicable laws and could result in criminal or civil penalties

Khosrow Hassibi, PHDChief Data Scientist

[email protected]

Je� Johnstone Director – Client Services, Analytics

Je�[email protected]

W H I T E P A P E R

IMAGE RECOGNITION

image recognition whitepaper-low res - r systems...

Documents