a new vision for internet search

28
eVision White Paper A New Vision for Internet Search A Technical White Paper from eVision 1 South 450 Summit Ave. Suite 210 Oakbrook Terrace, IL 60181 Ph: 630.932.8920 Fax: 630.932.8936 www.evisionglobal.com Unleashing the Power of Visual eBusiness

Upload: others

Post on 03-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

eVision White Paper

A New Vision for Internet Search A Technical White Paper from eVision

1 South 450 Summit Ave.Suite 210

Oakbrook Terrace, IL 60181Ph: 630.932.8920 ■ Fax: 630.932.8936

www.evisionglobal.com

Unleashing the Power of Visual eBusiness

A T E C H N I C A L W H I T E P A P E R

A New Vision for Internet Search

Printed: July, 2001

Information in this guide is subject to change without notice and does not constitute a commitment on the part of eVision LLC. It is supplied on an �as is� basis without any warranty of any kind, either explicit or implied.

Information may be changed or updated in this guide at any time.

Mailing Address eVision LLC

1 South 450 Summit Ave., Suite 210 Oakbrook Terrace, IL

60181

� eVision, LLC Technologies

ContentsWhy Should You Care About Visual Search? ________________________2

We Live in a Visual Society _______2 What Lies Beneath ______________2

The Role of Internet Search _______3 Search is Key to Website Success__3 The Importance of Effective Searching _____________________3

What�s Wrong with Text Search? ___4 The Role of Visual Search _________5 An Introduction to Content-Based Information Retrieval _____________6

Visual Characteristics ____________7 The Current State of CBIR Technology ____________________8 Challenges Facing CBIR _________8 The Next Step for CBIR _________10

The eVision Technology Solution _10 What Are Objects? _____________11 Object-Based Analysis and Search 11 Scale, Rotation, and Aspect Independence _________________12 Cluster-Based Indexing Drives eVision ______________________12

Formulating a Visual Query ______14 Constructing a Visual Vocabulary _15 Automated Visual Meta Tagging of Content ______________________15 Visual Behavior Tracking ________17

Applying eVision _______________18 Visual Search Engines and Applications __________________19 Original Equipment Manufacturers_19 Media Asset Management _______20 High-Margin Verticals ___________21 The Future of Search ___________22

eVision: Next Generation Visual Search Technology _____________23 References ____________________25 About eVision__________________26

W H Y S H O U L D Y O U C A R E A B O U T V I S U A L S E A R C H ?

2222

Why Should You Care About Visual Search?

We Live in a Visual Society �� 2,700: Number of photos

taken each second, world-wide

�� 3 billion: Number of rolls of film used each year

�� 80 billion: Number of new images created each year

�� Over 1 billion: Number of images on corporate nets and the internet related to commercial transactions in 2001

�� 10 billion: Estimated number of images on corporate nets and the Internet related to commercial transactions in 2003

What Lies Beneath �� 20: Number of hours lost on average each week by professionals searching for media

�� 42: Percentage of content duplicated in the media asset management space

�� 35 � 90: Percentage of failed attempts to find specific content in a media asset management system

�� 1 billion: Amount in dollars that the medical imaging industry loses each year due to missing or duplicated patient radiology information

�� 13 billion: Amount in dollars that online businesses lost in 2000 due to frustrated customers not finding what they were looking for

�� 25 � 30 billion: Estimated amount in dollars that global corporations lose each year due to brand counterfeiting, unauthorized use and misuse of brands and logos

�� 11 billion: Estimated amount in dollars of the total available market for visual search solutions in 2005

Sources: �� US Department of Commerce �� Creative Good �� Future Image, Inc. �� UC Berkeley Study �� Gistics �� McKenna Group �� Health Care Financing Administration

123456789

10

� 35% of search attempts fail

� 42% content duplicated

� 20 hours/week wasted in searches

� 7 Million new images added to the web daily

� Only 5% indexed � Rush to index �

Quantity vs Quality � 56% of text search

attempts fail

1998 2001 2003

Billions ofImages

T H E R O L E O F I N T E R N E T S E A R C H

3333

The Role of Internet Search The Internet has become the information backbone of the world economy. It is one of the first places that people go to get answers to questions, communicate with others, and make purchases.

Unfortunately, because of the growing popularity and amount of information available within the World Wide Web and other elements of the Internet, individuals and organizations are quickly becoming overwhelmed when trying to manage and assimilate its content. For example, even many individual web sites are now composed of tens of thousands of pages�pages that are often filled not only with text, but also with a combination of images and sounds.

While the Internet�s promise of massive content is now a reality, it is virtually useless without a powerful way to enable users to intelligently search and locate the information for which they are looking.

Search is Key to Website Success eCommerce managers and knowledge portal facilitators consider Internet search to be a crucial feature for their websites. In a June 200 report by Paul Hagen of Forrester Research Inc., 90% of those interviewed stated that search is "extremely important or very important". From manufacturing to apparel to retail, the volume of products and complexity of product offerings make the search results page one of the most-visited spots on a website.

The Importance of Effective Searching While search is considered critical to a successful website, how effective are the existing search mechanisms of most sites? Forrester found that while most companies (53%) feel that their search engine is "very useful", they do not routinely measure the success of those engines. Even more alarming is the fact that when website managers actually did "look under the covers" of their site�s search experience, they discovered many problems. These problems included simple queries that did not perform as expected and customers who became frustrated with inappropriate responses and hard-to-use interfaces.

For a search to be effective, the results returned by that search must be relevant, accurate, and useful. As website content grows exponentially in both quantity and types of information (text, images, and other "rich media"), existing search engines are less and less equipped to provide users with the kind of results that ensure satisfaction and repeat visits.

W H A T � S W R O N G W I T H T E X T S E A R C H ?

4444

What�s Wrong with Text Search? The central difficulty with Internet search today is that it is text-based. Searching a web site using text only can sometimes be a difficult task.

For example, if we want to buy a pair of shoes from an eCommerce site, we would start by typing the word �shoes� in that site�s search prompt. A text-based search like this typically returns numerous results that require us to spend more time navigating through the subsequent menus and prompts to narrow that search to the particular shoe in which we are interested. Our frustration is compounded by the fact that we can picture in our heads exactly what shoe we want, but am unable to translate that picture into commands that bring us to the right page.

There are several fundamental problems commonly associated with text search:

�� Text search is language-specific and context-specific. When we search by text, we must choose a language in which to specify the search. Even within a given language, there are many ways to specify (or attempt to specify) a request or object of desire.

�� Text search is highly error-prone. Typographical errors result in erroneous results or an empty result set.

�� Text is cumbersome. Search by text inevitably means that a website visitor must know about the keywords used by that site, or master a complex syntax for specifying non-trivial searches.

When using text for search, visitors to a website must adapt their entries to the constraints of the software running that website. This requirement often leads to a visitor�s frustration, wasted time, and the decreased likelihood of repeat visits and purchases. Technology clearly needs to "catch up" with the rich content found in today�s web sites.

T H E R O L E O F V I S U A L S E A R C H

5555

The Role of Visual Search It is remarkable that despite its short lifespan and many growing pains, the Internet has been able to deliver even partially on its promise of universal access to information. Equally remarkable though, is the fact that despite numerous advances in the technology, we remain limited in our ability to seek information based on anything but text.

The limitations imposed by a text-only search exist in stark contrast to some of the Web�s most important and advanced features, such as hypertext linking and embedded rich media. For example, while connected to the Internet, we can appreciate a photo exhibit, watch a movie, or listen to a concert right on our home or work computers. However, to find and use these amazing online resources of the Web, we all still rely on the text entered during our initial search (such as the title of a movie or name of a conductor). If we don�t enter the exact name or title of the item for which we are searching, more than likely our search will fail or return inaccurate results.

In response to this challenge, research organizations and commercial software companies have developed tools that allow a visual search of images and other forms of digital content. With visual search, Internet users can specify their needs and make selections based on images, rather than text.

The independence from text provided by visual search offers advantages far beyond the Internet:

�� A graphics designer quickly and easily finds just the right image for a brochure.

�� A doctor recommends more appropriate treatment based on a visual analysis of similar CAT scans.

�� A fashion designer quickly locates similar clothing designs, color usage, etc. when developing a new line of clothing.

In these cases and in hundreds more, the search for relevant information is performed not on keywords associated with an image, but on the image itself.

A N I N T R O D U C T I O N T O C O N T E N T - B A S E D I N F O R M A T I O N R E T R I E V A L

6666

An Introduction to Content-Based Information Retrieval Because digitized images consist of arrays of pixel intensities with no inherent meaning, the image databases that contain them are generally unstructured. Content-Based Information Retrieval (CBIR) is the process of searching for and retrieving images from an unstructured database based on information extracted from the content of those images.

When searching a database for visual images, CBIR systems base their retrievals on the content of an image, and not on external tags, such as file name, captions, headings, keywords attached as metatags, etc. This focus on content, rather than manually defined external tags, provides CBIR systems with the potential to be qualitatively more effective for image searches than any other type of search.

CBIR techniques can be applied equally to both images and video. These techniques are used to:

�� break up long videos into individual shots

�� extract still keyframes summarizing the content of each shot

�� search for video clips containing specified types of movement

While the user requirements for CBIR searching can vary considerably, we can generally characterize image queries into three levels of abstraction:

�� Primitive features query. A search based on color, texture, or shape. A typical query based on primitive features might be: "Find me all the images with a red ball."

�� Logical feature query. A search based on a feature derived through logical inference of the objects in the image. A sample query based on logical features might be: "Find me all the images of the Taj Mahal."

�� Abstract attributes query. A search in which high-level reasoning is applied to an image. A query based on abstract attributes might be: "Find me all the images of happy children."

CBIR systems currently operate effectively only at the Primitive query level. Nowadays, most users require at least some Logical feature queries, and in years to come, Abstract searches are expected to become commonplace.

A N I N T R O D U C T I O N T O C O N T E N T - B A S E D I N F O R M A T I O N R E T R I E V A L

7777

Visual Characteristics CBIR performs searches bases on certain characteristics from the content of an image, the most common of which are color, texture, and shape. The following sections describe each of these in more detail.

Color Typically, color schemes are based on computing a color histogram that shows the proportion of pixels for each color within an image. The following figure shows a 256-bin global color histogram of an image. At search time, we can either specify the desired proportion of each color, or submit an example image from which the color histogram is calculated. The matching program then retrieves images that match the color histogram of the query.

Figure 1 � left side: initial image, right side: 256-bin global color histogram of that image

Texture Texture characteristics of images are a complex concept to describe. A variety of techniques have been used to classify and match textures within images. The techniques yielding best results rely on statistical techniques to calculate the relative brightness of selected pairs of pixels from each image. These statistics are then used to calculate measures of image textures such as periodicity, directionality, and randomness. Texture queries can be formulated in a similar manner to color queries.

Shape Humans use shape to characterize objects. Shape is the most obvious requirement for primitive searches. Two types of shape features are currently used for CBIR: global features such as aspect ratio and local features such as edges. We can formulate shape queries by either identifying an example image or using a user-drawn sketch.

Every CBIR system utilizes these features to varying extents to improve the accuracy and relevancy of its searches. However, even these three features are not enough to support the needs of image users.

A N I N T R O D U C T I O N T O C O N T E N T - B A S E D I N F O R M A T I O N R E T R I E V A L

8888

The Current State of CBIR Technology CBIR systems are gaining a foothold in the commercial marketplace. Prime application areas for CBIR include: crime prevention (fingerprint and face recognition), intellectual property (trademark registration), journalism and advertising (video asset management), and Web searching. All of these applications employed image processing techniques to automatically extract different visual features such as color, texture, shape, structure, etc.

These and other first-generation CBIR solutions are attempting to overcome one or more of the three most common limitations of content-based retrieval:

�� manual annotations of images and videos

�� subjective interpretation of the content of an image

�� limited linguistic ability of the annotator

Despite these existing limitations, CBIR is a fast-developing technology with considerable potential that will become more and more prevalent in future search systems.

Challenges Facing CBIR The challenges that CBIR systems currently face fall into two main categories: fundamental technology and user adoption.

Fundamental Technology All currently available CBIR systems suffer from one, main disadvantage: they do not recognize the inherent nature of an image as a �collection of objects�. Because of this, these systems frequently generate inaccurate retrievals from image databases.

To overcome this disadvantage, the segmentation of an image into relevant object regions is critical. The accurate characterization of an image�s visual features such as color, texture, shape, and spatial location is dependent on good segmentation. Additionally, the idenitification of object regions within an image allows us to narrow our searches to a specific part of an image, instead of being limited to an image as a whole.

A N I N T R O D U C T I O N T O C O N T E N T - B A S E D I N F O R M A T I O N R E T R I E V A L

9999

The other two main technical challenges faced by CBIR systems are related to improving the areas of speed and accuracy.

�� Accuracy. The accuracy of searches initiated within CBIR systems depends on:

• using the best visual features to characterize the objects within an image

• finding appropriate similarity metrics that are relevant to people�s visual, cognitive system

• associating linguistic concepts to object configurations. The need for associating linguistic concepts to object configurations arises from the fact that majority of the human interaction with visual data still occurs at the linguistic level.

CBIR systems must include this techncial functionality to produce meaningful and accurate search results.

Figure 2 � An image with various objects. The objects are instances of a person, a car, a house, a tree, two clouds, a water stream, a grass patch, ground, and sky. Each object belongs to a different class, is

distinct, and has a unique illumination distribution. This illumination distribution is modeled using customizable functions.

�� Speed. The speed of search retrieval is another technical challenge confronting CBIR systems. As the number of visual assets within a database grows, searching through all those assets becomes increasingly time intensive. To make CBIR solutions scalable to large size image and video collections, efficient multi-dimensional indexing techniques are essential.

T H E E V I S I O N T E C H N O L O G Y S O L U T I O N

10101010

User Adoption The user interface of CBIR solutions represents another of the main obstacles in their widespread adoption by users. User interfaces for content-based searching must be intuitive, easy to use, and enable a user to:

�� quickly create/compose a visual query to start a visual search

�� select the appropriate visual search options to obtain relevant and desired results

The Next Step for CBIR While CBIR solutions have the potential to revolutionize the area of search and retrieval, no single solution to date is capable of overcoming the obstacles inherent in their implementation. Until now. Until eVision.

The eVision Technology Solution eVision was founded by a group of visual search experts with the goal of addressing and resolving key challenges to the adoption of CBIR technology. To achieve this goal, they offer breakthrough technologies in the areas of visual indexing, search, and retrieval. The following features of their technology distinguish eVision as a leader in the visual search industry and give it a competitive edge in the marketplace:

1. Automatic segmentation of images and videos into object regions and generation of �signatures� that capture their visual content

2. Ability to manage large collections of images with advanced similarity indexing

3. Unique concept of a Visual VocabularyTM that cuts through language barriers and provides the foundation for highly-usable and intuitive user interfaces

4. Support for Visual Meta Tagging that reduces the effort involved in asset ingestion process by automating the tagging of visual assets

5. Implementation of Visual Behavior Tracking which provides an automated feedback mechanism that further refines the search process

With these features, eVision can deploy visual search and retrieval in ways that can dramatically improve and transform both end-user experiences and fundamental business flow. Additionally, eVision�s premier solution (eVeTM) is designed with a very specific objective in mind: to streamline the adoption of eVision technology into a wide range of existing industries and applications.

T H E E V I S I O N T E C H N O L O G Y S O L U T I O N

11111111

What Are Objects? The core principle that differentiates eVision�s solution from all existing commercial CBIR solutions is:

An image is a collection of objects. Most people consider digital images to be nothing more than a conglomeration of pixels that form a recognizable pattern. For content-based retrieval, eVision has a different view: a digital image consists of a number of visual groups formed by visually similar pixels. eVision refers to these visual groups as objects, and any given image contains a number of objects that appear together to define that image.

eVision breaks new ground in CBIR by bringing unsupervised segmentation to the commercial world. Segmentation is the process by which an image is divided into object regions, with each region identifying a similar set of objects (see the above figure for an example). Once these object regions are identified, features such as color, texture, and shape are extracted from each of the object regions. Using this technique, the eVe technology can identify specific objects within an image, and enable users to accurately search for both a whole image and for parts (or objects) of an image.

The benefits of object recognition within images and the ability to perform partial image searches are far-reaching and unprecedented in the search industry.

Object-Based Analysis and Search Segmentation of an image into relevant objects is very important to content-based image retrieval. eVision�s image analysis automatically segments each image into regions, which correspond approximately to objects or parts of objects in an image. The following figure shows some samples of how an image is segmented into objects and what each of the color, texture, shape, and object signatures represent.

Note that a texture can only be represented mathematically as a numerical array. It cannot be independently visualized as in the case of a color patch. The figure shows texture as a patterned patch of the object.

Image Object Map Objects Color Texture Shape Object

Figure 3 � Segmentation of an Image

T H E E V I S I O N T E C H N O L O G Y S O L U T I O N

12121212

Scale, Rotation, and Aspect Independence �� Scale. The term "scale" refers to the size at which an object appears in an image. eVision's object

analysis normalizes visual signatures to compensate for the different sizes of objects and images. While this approach does not make the search function within eVe completely independent of scale, it does generate accurate results for a wide range of image sizes.

�� Rotation. Rotation refers to the angle at which an object appears in an image. The axes of rotation of an image indicates if an object is in the image plane or is perpendicular to the image plane. Currently, eVision's technology processes rotations around the axis that are perpendicular to the image plane. Rotations around axes that are within the image plane "hide" parts of the object that are currently exposed and expose new parts of the object that were previously hidden. Unless different views of the object are available, these kinds of rotations cannot be handled from a single view.

�� Aspect. Aspect refers to the aspect ratio of the image, which is the ratio of the height to the width. The signatures extracted from object regions are normalized to compensate for variations. As with scale, eVision's search technology is not completely independent of aspect ratio, but can handle a wide range of aspects.

Cluster-Based Indexing Drives eVision A search that returns images similar to an initial image is considered a "similarity search". To enable similarity searches, images within a database are distilled into multidimensional feature vectors. A similarity search represents the process of finding the nearest matches in the feature space based on distance metrics that reflect similarity.

For small collections of a few hundred images or less, a full search of all the existing feature vectors is a viable solution. When there are thousands upon thousands of images in a database, there is a need for indexing the visual signatures to avoid a linear search or a sequential full search. Within CBIR literature, this is popularly known as "similarity indexing".

The following figure illustrates the concept behind similarity indexing. The visual signatures of images within a database are grouped, or "clustered", based on a visual similarity metric. Clustering generates similarity groups based on the similarities and differences between visual signatures. The centroid of each cluster is the cluster representative. The members within each group closest to the centroid are included within the Visual Vocabulary for the database.

Figure 4 � Similarity Indexing

x

x

xxxx

x

xxxxx

xx

xxx

xx

x

xxx

x

x

x

x

Cluster

Cluster Centroid

x Visual Signature

x

x

xxxx

x

xxxxx

xx

xxx

xx

x

xxx

x

x

x

x

x

x

xxxx

x

xxxxx

xx

xxx

xx

x

xxx

x

x

x

xx

x

xxxx

x

xxxxx

xx

xxx

xx

x

xxx

x

x

x

x

x

x

xxxx

x

xxxxx

xx

xxx

xx

x

xxx

x

x

x

x

T H E E V I S I O N T E C H N O L O G Y S O L U T I O N

13131313

eVision has devoted considerable resources to both optimizing linear/traditional indexing and developing new, proprietary indexing methods. The results are shown in the following figures. The first figure shows the faster search time provided by eVe over competitors Virage and Convera. The second figure illustrates the scalability of eVe's proprietary indexing scheme: the search time remains essentially flat even as the number of images in the database increases from 1,000 up to 20,000.

Figure 5 � eVe Search Times vs. Competition (5)

Figure 6 � eVe Search Times for Linear and Indexed Search (12)

F O R M U L A T I N G A V I S U A L Q U E R Y

14141414

Formulating a Visual Query How users initiate a visual query is key to providing a rich and meaningful search experience. To provide this type of experience, eVe provides numerous ways to select an image upon which to base a visual search, including:

�� Using an image stored locally on disk, on a network, or on the Internet

�� Using an image from the Visual Vocabulary

�� Selecting an image from the result of a previous search

�� Using a text query to locate an initial image

�� Scanning in an image

�� Capturing an image using a digital camera

This flexibility provides users with the ability to select the method of initiating a visual search that best works for them, and thus serves as an important building block in the development of intuitive user interfaces.

After selecting the visual signal on which to base a search, eVe provides users with five properties to use when determining the criteria for that search. These criteria include text, color, texture, shape, and object. The following graphic illustrates how eVe segments an image using these properties.

Figure 7 � Visual Search Enhanced by Object Mapping/Segmentation

Figure 7 illustrates a Lung CT scan analyzed using eVe. The object map consists of 3 separate segments that are clearly shown with the colors red, green, and blue. The visual representations of the signatures for the four different dimensions, color, texture, shape, and object, are also shown.

��Automatic ��Spatially sensitive ��Enables object based search ��Independent of:

�� Aspect, Rotation, Scale

�� Language

��Takes fewer clicks to achieve �Right Answer�

F O R M U L A T I N G A V I S U A L Q U E R Y

15151515

Constructing a Visual Vocabulary Central to the concept of visual search is the need for a Visual Vocabulary. Just as keywords are used to retrieve the documents or pages that contain them, a Visual Vocabulary enables users to use visual elements as queries and retrieve images and videos that contain or are visually similar to those elements. In other words, a Visual Vocabulary allows a user to base an entire search on an image, rather than on a text. This is a fundamental change to the way in which people currently initiate visual searches.

Consider an online retail application where consumers shop for clothing and accessories. The Visual Vocabulary items within this application contain picture elements corresponding to hand bags, shoes, sweaters, hats, etc. Each of these picture elements has corresponding color and texture vocabulary swatches, where applicable. The following figure shows a few examples of these vocabulary items.

Figure 8 � Constructing a Visual Vocabulary

We can then initiate a visual search using one or more of the visual vocabulary items as a basis for that search. If, for example, we wish to purchase a particular style and color of handbag, we can simply select a handbag image from the vocabulary list, set our color criterion, and query the system to retrieve more handbags that are visually similar to the selected vocabulary item. Once the matching handbags from the seller�s database are retrieved, we can then select the handbag we want to buy.

A key advantage of the eVision technology is that it can automatically generate a representative Visual Vocabulary for a large number of images. An expert user can add or delete items from this set to gear it to a specific application such as retail, education, medical, brand logos, etc.

Automated Visual Meta Tagging of Content

Current Methods for Meta-Tagging Meta tagging is the process of associating or "tagging" an ingested image with information that is unique to that image. This information can then be used to retrieve the image at a later time.

Unfortunately, current meta tagging processes suffer from two main problems:

�� Images must be tagged with meta information individually, resulting in a very labor intensive process as the number of images increases.

F O R M U L A T I N G A V I S U A L Q U E R Y

16161616

�� The accuracy of meta tags depends on the person performing the tagging, thus increasing the chance of errors. For example, if a person lacks domain knowledge for a particular asset, then the tags for that asset might not be represented accurately.

Both problems make finding an asset based on its meta tags prone to inaccuracy.

Visual Meta Tagging Visual Meta Tagging is a process that alleviates some of the problems prevalent in the current methods for meta tagging.

When eVe analyzes the images, it can group them based on visual similarity into different "clusters". Users can then associate meta tags with all the images within a cluster at the same. For example, if eVe generates a cluster of images containing yellow cars, users can apply the meta tags "car" and "yellow" to all the images within that cluster simultaneously.

The process of Visual Meta Tagging:

�� reduces the amount of time it takes to tag assets

�� decreasing the chance for errors to occur

�� helps clear backlogged assets and bring them online faster

Figure 9 �Visual Meta Tagging Example Visual grouping Select and Tag

F O R M U L A T I N G A V I S U A L Q U E R Y

17171717

Visual Behavior Tracking Visual behavior tracking is the process by which an eVe-enabled system can �learn� about users� behavior. Consider a retail clothing web site enabled for visual search. Whenever a user performs a search, the path the user takes to arrive at a positive outcome (purchase) is remembered by the system. As more and more users utilize the system, the more information is collected about their buying habits. Out of this collected data, patterns of behavior are inferred. These patterns can then be applied to new shoppers, such that the number of clicks required to arrive at a positive outcome are reduced. This is an iterative process that improves with time and the number of shoppers.

For example, assume that 70% of the users shopping for a sweater select a blue cardigan. When a new user shops for a sweater, the system can now display the blue cardigan as highly relevant, thereby cutting down on the number of possible clicks a user must make.

A P P L Y I N G E V I S I O N

18181818

Applying eVision Over the past decade we have seen businesses rely more and more on the use of the Internet to provide new channels for customer acquisition and retention. Portals and websites are painstakingly designed with these customers in mind. Yet many businesses have ignored the cognitive patterns that have been identified centuries ago as important to how people think, inquire, and learn.

The value of eVision comes not in replacing the search and drill in/down capability of textual direction, but in enhancing and complementing those well-accepted methods, offering alternative search methods to individuals desiring a more visual approach.

eVision's fresh and unique approaches offer the possibility of transforming the way many markets and industries get their work done. Visual Search Engines (web and corporate nets), Embedded Technologies (original equipment manufacturers), Media Asset Management, and High Margin Verticals that utilize visual media for transactions (medical, brand management, security). The following chart summarizes eVision�s four distinct market segments:

Search EnginesExamples: Google, AltaVista, Yahoo,Inktomi, Lycos, & Visual e-business

Value: More accurate searches, faster, directly connect search results and e-commerce product transaction

Market: $1.4B

OEMsExamples: OS, Browsers, Chip, Wireless, Sony, Olympus, Apple, etc.

Value: Embedded Visual Search, Competitive platform differentiation

Market: $500M

Media Asset ManagementExamples: Software & Services Artesia, North Plains, Leo Burnett,

Value: Superior Media Asset indexing, management, and distribution. Superior ROI

Market: $500M

High Margin Verticals:Examples: Medical, e-tailers,auction sites, supply chain

Value: Improve medicaldiagnostics, better informedcustomers, surfers transformed tovisual communities

Figure 10 � eVision�s Market Segments

A P P L Y I N G E V I S I O N

19191919

Visual Search Engines and Applications Currently, users of the Web and corporate nets comprise a visual society that is limited by text-search solutions. In today�s marketplace, visual media drives both online and offline sales processes. eVision�s technology can augment text-based search engines, content management vendors, and system integrators that are targeting the following end users:

�� Fortune 2000 corporate sites that provide marketing and digital brand management, business communications with customers, investors, partners, staff, and employee training and education

�� B2B sites � industry-sponsored vertical eBusinesses

�� Auction sites and information portals

Consider how eVision's visual search could transform an auction site�s experience: Sandra decides to sell a Waterford crystal sculpture. Instead of simply offering a text description or static image, she takes advantage of a "True Image" service at the auction portal that analyzes and indexes the image. This index information is then made available to potential buyers, who can now use a Visual Vocabulary of crystal structures to search out pieces such as Sandra�s to add to their collection. This allows buyers to identify all relevant items much more quickly and effectively, resulting in superior matches between buyers and sellers, more efficient transactions, and increased revenue to the auction portal.

Information portals that offer visual search can also make this capability available to its customers as an additional licensed component.

Original Equipment Manufacturers Visual search and retrieval solutions can be embedded in operating systems, web browsers, and eventually on an ASIC chip. When integrated with an operating system, eVe natively enables search and retrieval of visual media online. eVe on an ASIC chip integrates image acquisition with search and retrieval for fast and accurate search. The eVe chip then becomes part of digital cameras, scanners, camcorders, medical imaging equipment, satellites, etc. to generate automatic searchable signatures at the time images and videos are captured.

Some target customers include:

�� Operating System Vendors

�� Browser Vendors

�� Hardware Vendors

A P P L Y I N G E V I S I O N

20202020

Media Asset Management The Media Asset Management market includes businesses where media assets represent products for sale or are used to enable catalog sale of products. eVision�s opportunity in the Media Asset Management market is to provide a visual search solution that can be integrated with existing vendor products to manage media assets, specifically images and videos. Some areas where visual search is indispensable are:

�� Media creators and users � creative professionals on corporate nets, the Internet, and Stock houses

�� Entertainment

�� Advertising

�� Marketing communications and brand management

�� Broadcasting - Interactive tele-media (video on-line news and sports web channels)

�� Publishing

�� Government (NIH, NIST, NIMA, CIA, digital libraries, earth and space sciences)

�� Education and training

�� Security - face recognition

�� Manufacturing - machine vision

Consider how eVision's visual search could transform a creative professional�s experience: Bill is responsible for creating a new vacation brochure for a travel company. First, he develops the concept for the brochure. Then, he needs to find some images that he can use to convey the concept across to his customers. To find the appropriate images, he can either look through several hundred pages worth of images in books and magazines, or go online to search image databases (local or Internet) using text keywords or phrases. According to documented research, it is estimated that using these methods he will spend on average about 10 hours searching for the desired images. This significantly impacts Bill�s productivity.

With an eVe-enabled system, Bill�s search can be speeded up tremendously. By submitting a sample, representative image upon which to search, he can ask the system to find all the images that are similar to it. The system will then find and display the images that are visually similar to Bill�s initial image, out of which he can choose the one that matches the concept for the travel brochure. In the instance where he does not have an example image to initiate search, Bill can either start with text, or he can use a custom Visual Vocabulary developed for the image database that he is searching.

With eVe�s content-based search, the images that users are looking for, can be found within a matter of minutes, not hours.

A P P L Y I N G E V I S I O N

21212121

High-Margin Verticals

Medical diagnostics The opportunity for eVision technology within the medical diagnostics field includes radiology picture acquisition, communication systems (PACS), and web radiology. Medical imaging is critical to patient diagnosis and care across a broad range of health-care procedures and disease states. Moreover, an increasing number of medical images are produced in digital format, including MRIs and CAT scans. It is estimated that the health-care sector spends $70 billion annually conducting radiology studies, of which 10% is spent on storing, handling, and transcribing this information. In addition, the Health Care Financing Administration estimates that approximately 10% ($1 billion) of expenditures are incurred due to duplication of missing patient information.

eVe provides an automated and visual content-based method for classification, search, and retrieval of radiology assets such as X-Rays, MRI, CAT scans, mammography scans, etc. This capability, combined with the ability to add contextual medical domain knowledge, creates a powerful and objective computer-aided diagnostic solution.

Brand management eVision technology can perform an important role in safeguarding global brands and corporate identities on the Internet for global 2000 corporations. Online brand counterfeiting, unauthorized use, and the misuse of brands and logos is estimated to cost global corporations $25 to $30 billion a year. Based on the criteria set by the brand owner, eVe enables proactive scouring of the Web or corporate networks to locate and report unauthorized or improper use of popular brand-related visual media (logos, images, videos, and graphics). This ability enables organizations to find rogue sites that are potentially diverting consumer attention and revenues.

A P P L Y I N G E V I S I O N

22222222

The Future of Search eVision's technology not only offers new and improved solutions to many different industries, it establishes a strong technical foundation for future requirements in search. eVision's algorithms and processes are not specific to images only--they can be applied to and analyze any digital pattern. As the demand for search technology progresses into areas such as smell, touch, and even emotion (truly abstract searches), eVision will be well-positioned to provide the engine and intelligence for those searches.

Figure 11 � The Future of Search Technology

Search Technology State of the Art

Alta Vista, Inktomi,Google, Autonomy,Mercado

eVision

ConveraVirageIBM

Search Technology Progression

E V I S I O N : N E X T G E N E R A T I O N V I S U A L S E A R C H T E C H N O L O G Y

23232323

eVision: Next Generation Visual Search Technology The Internet has become the information backbone of the world economy. It is a worldwide gathering place for consumers and information seekers. And while the amount of products and information on the Internet increases exponentially each year, the ability of people to accurately and rapidly access what they want has not been able to keep pace. Without powerful search capabilities to retrieve relevant information and find the products people want to buy, the massive content of the Web is virtually useless.

The central difficulty with Internet search today is that it is based on text search. Language barriers, incorrect meta tagging, inconsistent keyword usage, typographical errors, and a host of other problems often make text-based searching frustrating to Internet users.

A much more powerful way to mine the Internet for information is to analyze the actual content of rich media, including images and sounds. Content-based information retrieval, or CBIR, offers the potential for visual search, by which Internet users can specify their needs and make selections based on images, rather than text. Current CBIR applications have taken the first, halting steps to liberating Internet users from text, but they all suffer from serious limitations such as a lack of sophisticated analysis and poor performance.

eVision overcomes the obstacles of today�s CBIR systems.

Easier & More Intuitive than Text Search

Cognitively 200X Faster than Text

Multi-Dimensional, More Accurate

Allows Search Refinement � Zero In

Transcends Cultural and Language Barriers

Drives e-Commerce (Consumers find what they want)

Query Image

Search Results

Media Database

E V I S I O N : N E X T G E N E R A T I O N V I S U A L S E A R C H T E C H N O L O G Y

24242424

eVision breaks new ground in CBIR by offering the following capabilities:

1. Automatic segmentation of images and videos into object regions and generation of signatures that capture their visual content

2. Ability to manage large collections of images with advanced similarity indexing

3. Unique concept of a Visual Vocabulary that cuts through language barriers and provides the foundation for highly-usable and intuitive user interfaces

4. Visual Meta Tagging that reduces the effort involved in asset ingestion process by automating the tagging of visual assets

5. Patented processes that can be applied to any digital signature

Visual search offers powerful improvements over text search--people naturally interact and communicate with image and sound. Yet there are many more senses through which humans understand and change the world, including taste, smell, and even emotion. The next breakthrough in search technologies will allow computer users to explore the world of the Internet by utilizing the full range of sensory experiences ("Show me all the images containing happy faces.").

eVision's ground-breaking technology not only enhances visual search today, it leads the way to full sensory search in the near future.

R E F E R E N C E S

25252525

References �� Biederman, I (1987) �Recognition-by-components: a theory of human image understanding�

Psychological Review 94(2), 115-147

�� Chan, Y et al (1999) �Building systems to block pornography� presented at CIR-99: the Challenge of Image Retrieval, Newcastle upon Tyne, February 25-26, 1999

�� Del Bimbo, A., Visual Information Retrieval. Morgan Kaufmann. 1999, pp. 1-270.

�� Eakins, J. P., and Graham, M. E., Content-Based Image Retrieval: A Report to the JISC Technology Applications Programm. Institute for Image Data Research: Newcastle-Upon-Tyne. 1999, pp. 1-63

�� eVision Benchmark Study (2001)

�� Gupta, A et al (1996) �The Virage image search engine: an open framework for image management� in Storage and Retrieval for Image and Video Databases IV, Proc SPIE 2670, pp 76-87

�� Huang, T et al (1997) �Multimedia Analysis and Retrieval System (MARS) project� in Digital Image Access and Retrieval: 1996 Clinic on Library Applications of Data Processing (Heidorn, P B and Sandore, B, eds), 101-117. Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign

�� Liu, F and Picard, R W (1996) �Periodicity, directionality and randomness: Wold features for image modeling and retrieval� IEEE Transactions on Pattern Analysis and Machine Intelligence 18(7), 722-733

�� Ma W Y and Manjunath, B S (1997) �Netra: a toolbox for navigating large image databases� Proc IEEE International Conference on Image Processing (ICIP97), 1, 568-571

�� Niblack, W., Barber, R., Equitz, W. Flickner, M. Glasman, E. Petkovic, D. Yanker, P. Faloutsos, C. Taubin, G. The QBIC Project: Querying Images By Content Using Color, Texture and Shape. In: Proceedings of Storage and Retrieval for Image and Video Databases. San Jose, California, USA, SPIE, 1993.

�� Smith J R and Chang S F (1997a) �Querying by color regions using the VisualSEEk content-based visual query system� Intelligent Multimedia Information Retrieval (Maybury, M T, ed). AAAI Press, Menlo Park, CA, 23-

�� University of Manchester Benchmark Study (2001)

A B O U T E V I S I O N

26262626

About eVision

eVision was founded in 1999 by a group of experts who have dedicated more than ten years of research to visual search technology. eVision is guided by a seasoned management team comprised of engineers and business executives with a successful background in the creation and delivery of cutting-edge technology products to the evolving marketplace. The chief architect of the visual search technology, Dr. Srinivas Sista, has ten years of R&D experience focused specifically on digital communication, image processing, and pattern recognition.

eVision�s industry leading advisory board is comprised of accomplished professionals from a range of high-technology industries. This team uses their business acumen to gather customer, partner, and industry feedback that helps build and implement the company�s streamlined business strategy.

For additional information, please contact:

Kasu Sista, VP Technology Srinivas Sista, VP Research and Development Mat Malladi, CEO Peter Giegoldt, VP Operations

www.evisionglobal.com