interest in the potential of digital images has increased enormously over the last few years

56
Interest in the potential of digital images has increased enormously over the last few years, fuelled at least in part by the rapid growth of imaging on the World-Wide Web . Users in many professional fields are exploiting the opportunities offered by the ability to access and manipulate remotely- stored images in all kinds of new and exciting ways . However, they are also discovering that the process of locating a desired image in a large and varied collection can be a source of considerable frustration. The problems of image retrieval are becoming widely recognized, and the search for solutions an increasingly active area for research and development. The Content Based Image Retrieval (CBIR) is one of the most popular, rising research areas of the digital image processing. The most important task of the project is to bridge the information gap between the drawing and the picture, which is helped by own preprocessing transformation process. The goal is to develop a content-based associative search engine. "Content-based" means that the search will analyze the actual contents of the image rather than the metadata such as keywords, tags, and/or descriptions associated with the image. The retrieval results are grouped by color for better clarity. In our system the iteration of the utilization process is possible, by the current results looking again, thus increasing the precision. CBIR technique for retrieving images on the basis of automatically-derived features such as color, texture and shape. Content-based image retrieval (CBIR), also known as query by image content 1

Upload: richin-varghese

Post on 02-May-2017

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Interest in the potential of digital images has increased enormously over the last few years,

fuelled at least in part by the rapid growth of imaging on the World-Wide Web . Users in

many professional fields are exploiting the opportunities offered by the ability to access and

manipulate remotely-stored images in all kinds of new and exciting ways . However, they

are also discovering that the process of locating a desired image in a large and varied

collection can be a source of considerable frustration. The problems of image retrieval are

becoming widely recognized, and the search for solutions an increasingly active area for

research and development.

The Content Based Image Retrieval (CBIR) is one of the most popular, rising

research areas of the digital image processing. The most important task of the project is to

bridge the information gap between the drawing and the picture, which is helped by own

preprocessing transformation process. The goal is to develop a content-based associative

search engine. "Content-based" means that the search will analyze the actual contents of the

image rather than the metadata such as keywords, tags, and/or descriptions associated with

the image. The retrieval results are grouped by color for better clarity. In our system the

iteration of the utilization process is possible, by the current results looking again, thus

increasing the precision. CBIR technique for retrieving images on the basis of

automatically-derived features such as color, texture and shape. Content-based image

retrieval (CBIR), also known as query by image content (QBIC) and content-based visual

information retrieval (CBVIR). "Content-based" means that the search will analyze the

actual contents of the image. The term 'content' in this context might refer colors, shapes,

textures, or any other information that can be derived form the image itself.

In Content-based image retrieval systems, the visual contents of the images in the

database are extracted and described by multi-dimensional feature vectors. The feature

vectors of the images in the database form a feature database. To retrieve images, users

provide the retrieval system with example images or sketched figures. The system then

changes these examples into its internal representation of feature vectors. The similarities

/distances between the feature vectors of the query example or sketch and those of the

images in the database are then calculated and retrieval is performed with the aid of an

indexing scheme. The indexing scheme provides an efficient way to search for the image

database.

1

Page 2: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

1. INTRODUCTION

1.1 Need For The New System

Most of the available image search tools, such as Google Images and Yahoo! Image search,

are based on textual annotation of images. In these tools, images are manually annotated

with keywords and then retrieved using text-based search methods. The performances of

these systems are not satisfactory. The text based image retrieval has many disadvantages

such as Manual annotation is not always accurate, is not available and is impossible for a

large DB, Surrounding text may not describe the image, Problem of image annotation,

Problem of human perception, Subjectivity of human perception, Too much responsibility

on the end-user, etc. Text-based image search engines index images using the words

associated with theimages. Depending on whether the indexing is done automatically or

manually, image search engines adopting this approach may be further classified into two

categories: Web image search engine or collection-based search engine. Web image search

engines collect images embedded in Web pages from other sites on the Internet, and index

them using the text automatically derived from containing Web pages. Most commercial

image search engines fall into this category. On the contrary, collection-based search

engines index image collections using the keywords annotated by human indexers. Digital

libraries and commercial stock photo collection providers are good examples of this kind of

search engines. However, text-based image retrieval also faces many challenges. One major

problem is that the task of describing image content is highly subjective. The perspective of

textual descriptions given by an annotator could be different from the perspective of a user.

A picture can mean different things to different people. It can also mean different things to

the same person at different time. Furthermore, even with the same view, the words used

to describe the content could vary from one person to another . In other words, there could

be a variety of inconsistencies between user textual queries and image annotations or

descriptions.

A picture is worth a thousand words. Thus Retreieval System using images were

developed. Sketch based image retrieval – SBIR, which is based on a free hand sketch. But

the performance of this system was not satisfactory. It was not possible to derive the low

2

Page 3: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

level features from the sketches. To handle the informational gap between a sketch and a

colored image, the modification of SBIR was developed, making an opportunity for the

efficient search hereby. Thus the CBIR is a modification to the “Sketch Based Image

Retrieval”. The goal is to develop a content-based associative search engine. The retrieval

results are grouped by color for better clarity. In the system the iteration of the utilization

process is possible, by the current results looking again, thus increasing the precision. The

scope of this project is to review the current state of the art in content-based image retrieval

(CBIR), a technique for retrieving images on the basis of automatically-derived features

such as color, texture and shape. The content based image retrieval (CBIR) is one of the

most popular, rising research areas of the digital image processing.The goal of CBIR is to

extract visual content of an image automatically, like color, texture, or shape. Our purpose

is to develop a content based image retrieval system, which can retrieve using sketches in

frequently used databases. The user has a drawing area where he can draw those sketches,

which are the base of the retrieval method.

In contrast to the text-based approach of the systems, CBIR operates on a totally

different principle, retrieving stored images from a collection by comparing features

automatically extracted from the images themselves. The commonest features used are

mathematical measures of color, texture or shape. The system allows users to formulate

queries by submitting an example of the type of image being sought, though some offer

alternatives such as selection from a palette or sketch input. The system then identifies those

stored images whose feature values match those of the query most closely, and displays

thumbnails of these images on the screen Error! Reference source not found. Image retrieval

does not entail solving the general image understanding problem. It may be sufficient that a

retrieval system present similar images, similar in some user-defined sense.

The ideal CBIR system from a user perspective would involve what is referred to as

semantic retrieval, where the user makes a request like "find pictures of dogs" or even "find

pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers

to perform - pictures of chihuahuas and Great Danes look very different, and Lincoln may

not always be facing the camera or in the same pose. Current CBIR systems therefore

generally make use of lower-level features like texture, color, and shape, although some

systems take advantage of very common higher-level features like faces (see facial

recognition system). Not every CBIR system is generic. Some systems are designed for a

3

Page 4: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

specific domain, e.g. shape matching can be used for finding parts inside a CAD-CAM

database.

1.2 Detailed Problem Definition

In text based image retrieval, images are manually annotated with keywords and then

retrieved using text-based search methods. Text-based retrieval is old. This has many

disadvantages such as Manual annotation is not always accurate, is not available and is

impossible for a large DB, Surrounding text may not describe the image, Manual image

annotation is time consuming, Problem of image annotation, Problem of human perception,

Subjectivity of human perception, Too much responsibility on the end-user, etc. Also

mistakes in the text like spelling errors, spelling difference may lead to the inefficiency of

the retrieval system. Text based image retrieval/indexing is also known as Concept-based

image indexing or description-based image indexing.

The content based image retrieval (CBIR) is one of the most popular, rising research

areas of the digital image processing. CBIR is a technique for retrieving images on the basis

of automatically-derived features such as color, texture and shape. The goal of CBIR is to

extract visual content of an image automatically, like color, texture, or shape. Our purpose

is to develop a content based image retrieval system, which can retrieve using sketches in

frequently used databases. The user has a drawing area where he can draw those sketches,

which are the base of the retrieval method. In Content-based image retrieval systems, the

visual contents of the images in the database are extracted and described by multi-

dimensional feature vectors. Content-Based Image Retrieval (CBIR) is the mainstay of

current image retrieval systems. In general, the purpose of CBIR is to present an image

conceptually, with a set of low-level visual features such as color, texture, and shape. These

conventional approaches for image retrieval are based on the computation of the similarity

between the user’s query and images. To be more profitable, relevance feedback techniques

were incorporated into CBIR such that more precise results can be obtained by taking user’s

feedbacks into account. Based on the feature vectors and the sample image the retrieval

subsystem provides the response list for the user using the displaying subsystem(GUI).As

the feature vectors are ready, the retrieval can start. The images and the necessary

mechanism for subsequent processing is provided. Thus, the retrieval process is highly

interactive.

4

Page 5: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

1.3 Viability Of The System

The content based image retrieval (CBIR) is one of the most popular, rising research

areas of the digital image processing. CBIR is a technique for retrieving images on the basis

of automatically-derived features such as color, texture and shape. The goal of CBIR is to

extract visual content of an image automatically, like color, texture, or shape. Based on the

feature vectors and the sample image the retrieval subsystem provides the response list for

the user using the displaying subsystem(GUI).As the feature vectors are ready, the retrieval

can start. The images and the necessary mechanism for subsequent processing is provided.

The number of results to show in the user interface is an important aspect. This number

depends on the resolution of the monitor. In the system the possible results are classified,

and the obtained clusters are displayed. Thus the retrieval process is highly interactive.

Content-based" means that the search will analyze the actual contents of the

image. The term 'content' in this context might refer colors, shapes, textures, or any other

information that can be derived form the image itself. Without the ability to examine image

content, searches must rely on metadata such as captions or keywords. Such metadata must

be generated by a human and stored alongside each image in the database. Problems with

traditional methods of image indexing have led to the rise of interest in techniques for

retrieving images on the basis of automatically-derived features such as color, texture and

shape – a technology now generally referred to as Content-Based Image Retrieval (CBIR).

CBIR system users to formulate queries by submitting an example of the type of image

being sought, though some offer alternatives such as selection from an image or sketch

input. The system then identifies those stored images whose feature values match those of

the query most closely, and displays thumbnails of these images on the screen Error!

Reference source not found. Users needing to retrieve images from a collection come from

a variety of domains, including crime prevention, medicine, architecture, fashion and

publishing.

Here implements the CBIR system which takes into consideration the low level

features of image which is more comprehensive when compared to high level features and it

also gives user a higher level of retrieval. User always wants a friendly environment so that

they can easily and effectively use the system without actually going into the finer details of

the working. So, to create such a user friendly platform for the system we have designed a

5

Page 6: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Graphic User Interface where user can actually select the method which they want to be

used for the image retrieval and that will give them an option of using different method if

the result is not as per their requirement. 

1.4 Presently Available Systems For The Same1.4.1. Scope of the project

The scope of this project is to review the current state of the art in content-based

image retrieval (CBIR), a technique for retrieving images on the basis of automatically-

derived features such as color, texture and shape. Our findings are based both on a review of

the relevant literature and on discussions with researchers in the field.

The need to find a desired image from a collection is shared by many professional

groups, including journalists, design engineers and art historians. While the requirements of

image users can vary considerably, it can be useful to characterize image queries into three

levels of abstraction: primitive features such as color or shape, logical features such as the

identity of objects shown and abstract attributes such as the significance of the scenes

depicted. While CBIR systems currently operate effectively only at the lowest of these

levels, most users demand higher levels of retrieval. The goal of CBIR is to extract visual

content of an image automatically, like color, texture, or shape.

1.5 Future Prospects

The software must be in such a way that it should adapt to changes. Our system is capable

of meeting all future changes without much modification. The program is coded in more

structured manner so we can include more future enhancements.

Developments and studies are going on for further improvements in design and performance of “CONTENT BASED IMAGE RETRIEVAL SYSTEMS”. The project have only done color analysis and shape, and texture but the information about object location, is discarded. Thus this project showed that images retrieved by using the above mentioned methods may not be semantically related even though they share similar color distribution in some results.   

6

Page 7: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

In the future enhancements we can implement: improved methods for Web searching, allowing users to identify

images of interest in remote sites by a variety of image and textual cues

improved video retrieval techniques, including automatic segmentation, query-by-motion facilities, and integration of sound and video searching

better user interaction, including improved techniques for image browsing and exploiting user feedback.

automatic or semi-automatic methods of capturing image semantics for retrieval.

2. ANALYSIS

2.1. PROJECT MANAGEMENT

Project management is the discipline of planning, organizing, securing, and managing

resources to achieve specific goals. A project is a temporary endeavor with a defined

beginning and end (usually time-constrained) undertaken to meet unique goals and

objectives, typically to bring about beneficial change or added value.

The primary challenge of project management is to achieve all of the project goals and

objectives while honoring the preconceived constraints. Typical constraints are scope, time,

and budget. The secondary and more ambitious challenge is to optimize the allocation of

necessary inputs and integrate them to meet pre-defined objectives.

ModulesThe CBIR project includes two modules-

User Module

7

Page 8: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Indexing Module

Searching Module

Displaying of resultant

User Module

In this module, the user can upload images and can perform some basic image editing

operations such as crop, drawing, zooming in and out. The file chooser only allows the user

to upload images with jpg and png extensions. The uploaded image set into an image icon

on the software platform. In cropping, the user can crop images by specifying the crop

values and by adjusting the crop slip. Cropped image is later uploaded to a folder for further

uses. User can draw sketches in the drawing area provided and can perform the basic

operations with this sketch and can also be uploaded to a folder. The main function

performed here is query image uploading.

Indexing Module

A number of indexing schemes use classification codes rather than keywords or subject

descriptors to describe image content, as these can give a greater degree of language

independence and show concept hierarchies more clearly. Indexing consists of numbering

of images in the selected directory and extracting its features, and then stored in an array

list. A document builder is created according to the extracted features of images. Indexing

operation is performed using two parallel threads. The indexing phase includes:

Preprocessing: The image is first processed in order to extract the features, which describe

its contents. The processing involves filtering, normalization, segmentation, and object

identification. The output of this stage is a set of significant regions and objects.

Feature extraction: Features such as shape, texture, color, etc. are used to describe the

content of the image. Image features can be classified into primitives. The feature is defined

as a function of one or more measurements, each of which specifies some quantifiable

property of an object, and is computed such that it quantifies some significant

characteristics of the object. Features extracted are:

8

Page 9: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

• General features: Application independent features suchas color, texture, and shape.

According to the abstractionlevel, they can be further divided into: Pixel-level features,

Local features and Global features.

• Domain-specific features: Application dependent features such as human faces,

fingerprints, and conceptual features. These features are often a synthesis of low-level

features for a specific domain.

In content-based image retrieval, images are automatically indexed by generating a

feature vector (stored as an index in feature databases) describing the content of the image.

The similarity of the feature vectors of the query and directory images is measured to

retrieve the image. The directory containing images are indexed before searching inorder to

make the comparison easier and faster. Incase if the mentioned directory is empty, the

required images can be downloaded from the web by using the Flicker Download feature of

CBIR. Low-level features can be extracted directed from the original images. Cbir uses the

low level features extracted.

Searching Module

After indexing is performed, searching begins by uploading the image for searching. This

image is preprocessed and feature vector for the same is generated. Next the user need to

select algorithms for searching. The used algorithms in this project are

CEDD

This algorithm deals with a new low level feature that is extracted from the images and

can be used for indexing and retrieval. This feature is called “Color and Edge Directivity

Descriptor” and incorporates color and texture information in a histogram. CEDD size is

limited to 54 bytes per image, rendering this descriptor suitable for use in large image

databases. One of the most important attribute of the CEDD is the low computational power

needed for its extraction, in comparison with the needs of the most MPEG-7 descriptors

FCTH

9

Page 10: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

This algorithm deals with the extraction of a new low level feature that

combines, in one histogram, color and texture information. This feature is named FCTH -

Fuzzy Color and Texture Histogram - and results from the combination of 3 fuzzy systems.

FCTH size is limited to 72 bytes per image, rendering this descriptor suitable for use in

large image databases. The proposed feature is appropriate for accurately retrieving images

even in distortion cases such as deformations, noise and smoothing. It is tested on a large

number of images selected from proprietary image databases or randomly retrieved from

popular search engines.

EDGE HISTOGRAM

It is the principle of capturing spatial distribution of edges. It is useful in image matching

even if the texture itself is not homogenous. The image is partionted into 4 x 4 Subimages

and five local egde histogram are computed for the subimages.

RGB COLOR HISTOGRAM

RGB model uses 3 color components Red Green Blue. Model has the advantage for

being easy to extract. It is the most common method of image retrieval. It uses the

concept to analyze RGB color value of finding optimum color intervals.

TAMURA

It uses the following low level features of an image. Texture , Roughness, Contrast,

Direction, Linearity are the Tamura features. The properties of texture image is passed

through texture analysis. Texture features are extracted by texture analysis and passed to

fuzzy clustering. Here in fuzzy clustering, the extracted features are stored for further

comparison.

Displaying of resultant

Because drawings are the basis of the retrieval, thus a drawing surface is provided,

where they can be produced. Also a folder containing collection of images is needed for

search, which also must be set before the search. In case of large result set the systematic

10

Page 11: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

arrangement of search results makes much easier the overviews, so it is guaranteed. The

methods in our system cannot work without parameters, and therefore an opportunity is

provided to set these as well.

The number of results to show in the user interface is an important aspect. Prima

facie the first n pieces of results can be displayed, which conveniently can be placed in the

user interface. This number depends on the resolution of the monitor, and the number of

results entered by the user. In our system the possible results are classified, and the

obtained clusters are displayed. Hence the solution set is more ordered and transparent. By

default the results are displayed by relevance, but false-positive results can be occurred,

which worsen the retrieval results. If the results are reclassified in according to some

criterion, then the number of false-positive results decreases. Thus the user perception is

better. Since the color-based clustering for us is the best solution, so our choice

was the k-means clustering method ,which is perfectly suited for this purpose.

The search results according to their priority, the images are displayed in a tabular format,

with its location detail. The images are displayed in the order of similarity with the query

image. Highest similar image is displayed first. The number of search results can be

assigned by the user.

2.2. REQUIREMENT ANALYSISRequirement analysis is the process of analyzing the requirements with the potential goal of

improving or modifying it. Requirement analysis is an important phase during any

application development. Mainly it contains the analyzing phase of existing system and its

features and also the proposed system. Analyzing its advantages and disadvantages the

proposed system can be designed which can avoid all the complexities, inabilities and the

disadvantages of the existing system. The new system requirements are defined during this

phase. The requirements of the desired software product are extracted.

To design a new system we need the requirements of the system and description of the

system. Based the business scenario the Software Requirement Specification document is

prepared in this phase. The purpose of this document is to specify the functional

requirements of the software that is to be build. These specifications are intended to guide

the group through the development process. It explains all the process, activities,

relationships and all other organizational objectives.

11

Page 12: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Requirement analysis is done in order to understand the problem with which the

software system to solve. For example, the problem would be automating an existing

manual process or developing a completely new automated system or the combination of

the two. For large systems that have large number of features and need to perform many

different tasks, understanding the requirements of the system is a major task. The emphasis

in requirement analysis is on identifying what is needed from the system and not how the

system will achieve its goal. This task is complicated by the fact that there are often at least

two parties involved software development, a client and a developer.

There are two major activities in the phase –problem understanding or analysis and

requirement specification. In problem analysis the analyst has to understand the problem

and its context. Such analysis typically requires a thorough understanding of the existing

system and a part of which must be automated. With then analysis of current system the

analyst can understand the reason for automation and what affects the automation system

might have.

The goal of this activity has to understand the requirements of the new system to be

developed. The requirement analysis understands the user’s requirement within the frame

work of organization’s objectives and the environment in which the system is being

installed. The consideration is given to the user’s resources as well as finance. User’s

requirements have been identified as follows.

Pre-defined Questions

It allows analysts to collect information about the various aspects of the system from

large number of persons the use of standardized question format can yield more

reliable data than other technique.

Interview

Analysts use interviews to collect information from individuals or from groups. The

respondents are generally current users of the existing system.

Record Interview

In record interview analysts examine information that has been recorded about the

system and the users. Record inspection can be performed at the beginning of the

study as an introduction, or later in the study as a basis for comparing actual

operations with what the records indicates should be happening.

12

Page 13: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

2.2.1. FEASIBILITY STUDYIt is both necessary and prudent to evaluate the feasibility of the project at the earliest

possible time. Feasibility and risk analysis is related in many ways. If project risk is great,

the feasibility listed below is equally important. The development of a computer based

system is more likely to be plagued by a scarcity of resources. Three essential

considerations in feasibility analysis are technical, economic, and operational or behavioral

feasibility.

Technical Feasibility

Technical feasibility includes the study of the function, performance and constraints, and

hardware and software verifications that may affect the ability to achieve an acceptable

system. It center on the existing computer system and what the extend it can support the

proposed addition, that is, if the current computer is operating at 80% capacity then running

another application could over load the system or require additional hardware. This involves

financial considerations to accommodate the additional enhancements. Here we don’t

require any extra hardware. So the system is technically feasible.

Economic Feasibility

Economic analysis is the most frequently used method for evaluating effectiveness of the

candidate system. More commonly known as cost/benefit analysis. The procedure is to

determine the benefits and savings that are expected from the candidate system and

compare them with the costs and benefits of existing system and then a decision is made to

design and implement the system. Otherwise further justification of alterations in the

proposed system will have to be made if it is to have a chance of being approved. This

system reduces the -operating cost in terms of time by automating the process of giving

remedy. Chance of errors is minimized and the benefits to the organization are more.

Hence, this system is economically feasible.

Operational Feasibility

13

Page 14: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

People are inherently resistant to change, and computer has been known to facilitate

change. An estimate should be made about the reaction of user staff towards the

development of a computerized system. Computer installations have something to do with

turnover, transfer and changes in job status. The introduction of candidate system requires

special effort to educate, sell and train the staff for conducting the business.

The candidate system was found to be technically, economically and behaviorally

feasible. The system was developed user friendly, needless training and improves the

working environment. Justification of any capital outlay is that it will increase profit reduce

expenditures or improve the quality of the service or goods which in turn may be expected

to provide increased profits. Disregarding the initial expenses the candidate system was

assessed to be feasible in all ways.

As the system is economically, technically and operationally feasible, this system is

judged as feasible.

2.2.2. EXISTING SYSTEM

Firstly, a detailed study of the existing systems was performed. Today, there are numerous

systems for image retreival using different mathods. The existing systems were Automatic

Image Annotation and Retrieval using Cross Media Relevance Models, Concept Based

Query Expansion, Query System Bridging The Semantic Gap For Large Image Databases,

Ontology-Based Query Expansion Widget for information Retrieval, Detecting image

purpose in World-Wide Web documents. There are several short comings in the existing

system. They are:-

Manual annotation is not always accurate.

Manual annotation is not available.

Manual image annotation is time consuming.

Manual annotation is impossible for a large DB.

Surrounding text may not describe the image.

Problem of image annotation.

Problem of human perception.

Subjectivity of human perception.

Too much responsibility on the end-user,

During the analysis phase we collected a lot of information about the existing system from various sites and documents, communicated with

14

Page 15: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

the users, identified the shortcomings of the existing system as well as noted down their innovative suggestions and comments. Keeping in mind the above shortcomings we decided to develop a software “Sketch4Match –Content-based Image Retrieval System Using Sketches” , which is user friendly and helps in efficient retreival of images even from a large collection of images.

3.1 SOFTWARE REQUIREMENT SPECIFICATION

The SRS includes the following :

The purpose and scope of CBIR project.

A literature review on the project which gave an overview of the content based

image retrieval system.

A glossary, to get better familiarity with the terms used.

GUI - Graphical User Interface

CBIR - Content Based Image Retrieval

SBIR - Sketch Based Image Retrieval

JFC - Java Foundation Classes

FCTH - Fuzzy Color And Texture Histogram

CEDD - Color and Edge Descriptor Directivity

JMF - Java Media Framework

References, mentioned clearly in appropriate format.

An overview of document, which mentioned the hardware and software

requirements, system environment, user interface, functional and non-functional

requirements.

An overall description of the project which consisted of system environment

(DFDs), functional requirement specification (use case diagrams), user interface

specification (screenshots), non-functional requirements (performance parameters,

design constraints, standard compliance, database requirements).

15

Page 16: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Requirement specification which included the external interface

System requirements- hardware and software configurations.

3.1.1. DATAFLOW DIAGRAMS (DFD)

A dataflow diagram is a graphical technique that depicts information flow and transforms

that are applied as data move from input to output. The DFD is also known as Data Flow

Graph or Bubble Chart. The DFD is used to represent increasing information flow and

functional detail. Also DFD can be stated as the starting point of the design phase that

functionally decomposes the requirements specifications down to the lowest level of details.

A level 0 also called a functional system model or a fundamental system model or a

context level DFD that represent the entire software elements as a single bubble with input

and output data indicated by incoming and outgoing arrows, respectively. Additional

process and information flow parts are represented in the next level 1 DFD. Each of the

processes represented at level1 are sub functions of overall system depicted in the context

model. Any processes that are complex in level 1 will be further represented into sub

functions in the next level, i.e. level 2.

Data flow diagram is a means of representing a system at any level of detail with a

graphic network of symbols showing data flows, data stores, data processes and data

sources. The purpose of data flow diagram is to provide a systematic bridge between users

and system developers. The diagrams are the basis of structured system analysis. A DFD

describes what data flows rather than how they are processed, so it does not depend on

hardware, software, data structure or file organization.

Components of Data Flow Diagram

There are four symbols that are in the drawing of Data Flow Diagrams:

Entities

External entities represent the sources of data that enter the system or the recipients of data

that leave the system.

16

Page 17: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Process

Processes represent activities in which data is manipulated by being stored or transformed

in some way. A circle represents it. The process will show the data transformation or

change

Databases

Databases represent storage of data within the system.

Data flow

A Data flow shows the flow of information from its sources to its destination. A line

represents op a dataflow, with arrowhead & and showing the direction of flow.

Level 0

17

Page 18: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Level 1 (Indexing)

18

Page 19: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Level 1 (Searching)

19

Page 20: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Level 2

5. CODING

20

Page 21: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

SYSTEM IMPLEMENTATION

The term implementation has different meanings ranging from the conversion of a basic

application to a complete replacement of a computer system. The procedures

however, are virtually the same. Implementation includes all those activities that take

place to convert from old system to new.

The new system may be totally new replacing an existing manual or automated system

or it may be major modification to an existing system. The method of implementation and

time scale to be adopted is found out initially. The system is tested properly and at the same

time the users are trained in the new procedure. Proper implementation is essential to

provide a reliable system to meet organization requirement.

Successful and efficient utilization of the system can be achieved only through proper

implementation of the system in the organization’s implementation phase is also important

like other phases such as ,analysis, design, coding and testing.

Careful planning

Investigation of the system and it’s constraints

Design the methods to achieve the change over

Training the staff in the changed phase

Ensuring the user has understood and accepted the changes

Getting complete feedback during test run and ensuring everything in perfect for the

final changeover.

IMAGE UPLOADING

In image uploading, the user can upload images and can perform some basic image editing

operations such as crop, drawing, zooming in and out. The file chooser only allows the user

to upload images with jpg and png extensions. The uploaded image set into an image icon

on the software platform. In cropping, the user can crop images by specifying the crop

values and by adjusting the crop slip. Cropped image is later uploaded to a folder for further

uses. User can draw sketches in the drawing area provided and can perform the basic

operations with this sketch and can also be uploaded to a folder. The main function

performed here is query image uploading.

21

Page 22: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

FOLDER INDEXING

A number of indexing schemes use classification codes rather than keywords or subject

descriptors to describe image content, as these can give a greater degree of language

independence and show concept hierarchies more clearly. Indexing consists of numbering

of images in the selected directory and extracting its features, and then stored in an array

list. A document builder is created according to the extracted features of images. Indexing

operation is performed using two parallel threads. The indexing phase includes:

Preprocessing: The image is first processed in order to extract the features, which describe

its contents. The processing involves filtering, normalization, segmentation, and object

identification. The output of this stage is a set of significant regions and objects.

Feature extraction: Features such as shape, texture, color, etc. are used to describe the

content of the image. Image features can be classified into primitives. The feature is defined

as a function of one or more measurements, each of which specifies some quantifiable

property of an object, and is computed such that it quantifies some significant

characteristics of the object. Features extracted are:

• General features: Application independent features suchas color, texture, and shape.

According to the abstractionlevel, they can be further divided into: Pixel-level features,

Local features and Global features.

• Domain-specific features: Application dependent features such as human faces,

fingerprints, and conceptual features. These features are often a synthesis of low-level

features for a specific domain.

In content-based image retrieval, images are automatically indexed by generating a

feature vector (stored as an index in feature databases) describing the content of the image.

The similarity of the feature vectors of the query and directory images is measured to

retrieve the image. The directory containing images are indexed before searching inorder to

make the comparison easier and faster. Incase if the mentioned directory is empty, the

required images can be downloaded from the web by using the Flicker Download feature of

CBIR. Low-level features can be extracted directed from the original images. Cbir uses the

low level features extracted.

22

Page 23: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

IMAGE SEARCHING

DISPLAY OF SEARCH RESULTS

Because drawings are the basis of the retrieval, thus a drawing surface is provided, where

they can be produced. Also a folder containing collection of images is needed for search,

which also must be set before the search. In case of large result set the systematic

arrangement of search results makes much easier the overviews, so it is guaranteed. The

methods in our system cannot work without parameters, and therefore an opportunity is

provided to set these as well.

The number of results to show in the user interface is an important aspect. Prima

facie the first n pieces of results can be displayed, which conveniently can be placed in the

user interface. This number depends on the resolution of the monitor, and the number of

results entered by the user. In our system the possible results are classified, and the

obtained clusters are displayed. Hence the solution set is more ordered and transparent. By

default the results are displayed by relevance, but false-positive results can be occurred,

which worsen the retrieval results. If the results are reclassified in according to some

criterion, then the number of false-positive results decreases. Thus the user perception is

better. Since the color-based clustering for us is the best solution, so our choice

was the k-means clustering method ,which is perfectly suited for this purpose.

The search results according to their priority, the images are displayed in a tabular format,

with its location detail. The images are displayed in the order of similarity with the query

image. Highest similar image is displayed first. The number of search results can be

assigned by the user.

Content Based Image Retrieval System

CBIR is the automatic retrieval of images from a large collection of images by color and

shape feature. The term has since been widely used to describe the process of retrieving

desired images from a large collection on the basis of features (such as colour, texture and

23

Page 24: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

shape) that can be automatically extracted from the images themselves. The features used

for retrieval can be either primitive or semantic, but the extraction process must be

predominantly automatic. Retrieval of images by manually-assigned keywords is definitely

not CBIR as the term is generally understood – even if the keywords describe image

content.CBIR differs from classical information retrieval in that image databases are

essentially unstructured, since digitized images consist purely of arrays of pixel intensities,

with no inherent meaning. One of the key issues with any kind of image processing is the

need to extract useful information from the raw data (such as recognizing the presence of

particular shapes or textures) before any kind of reasoning about the image’s contents is

possible. Image databases thus differ fundamentally from text databases, where the raw

material (words stored as ASCII character strings) has already been logically structured by

the author . There is no equivalent of level 1 retrieval in a text database.

CBIR draws many of its methods from the field of image processing and computer

vision, and is regarded by some as a subset of that field. It differs from these fields

principally through its emphasis on the retrieval of images with desired characteristics from

a collection of significant size. Image processing covers a much wider field, including

image enhancement, compression, transmission, and interpretation. While there are

grey areas (such as object recognition by feature analysis), the distinction between

mainstream image analysis and CBIR is usually fairly clear-cut. An example may make this

clear. Many police forces now use automatic face recognition systems. Such systems may

be used in one of two ways. Firstly, the image in front of the camera may be compared with

a single individual’s database record to verify his or her identity. In this case, only two

images are matched, a process few observers would call CBIR. Secondly, the entiredatabase

may be searched to find the most closely matching images. This is a genuine example of

CBIR.

24

Page 25: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Fig: 3.3 Block Diagram of CBIR System

The process of retrieving desired images from a large collection on the basis of features

(such as color, texture and shape) that can be automatically extracted from the images

themselves. The features used for retrieval can be either primitive or semantic, but the

extraction process must be predominantly automatic.

In Content-based image retrieval systems, the visual contents of the images in the database

are extracted and described by multi-dimensional feature vectors. The feature vectors of the

images in the database form a feature database. To retrieve images, users provide the

retrieval system with example images or sketched figures. The system then changes these

examples into its internal representation of feature vectors. The similarities /distances

between the feature vectors of the query example or sketch and those of the images in the

database are then calculated and retrieval is performed with the aid of an indexing scheme.

The indexing scheme provides an efficient way to search for the image database.

The Global Structure Of The SystemThe system building blocks include a preprocessing subsystem, which eliminates the

problems caused by the diversity of images. Using the feature vector generating subsystem

the image can be represented by numbers considering a given propert. Based on the feature

vectors and the sample image the retrieval subsystem provides the response list for the user

using the displaying subsystem (GUI).

Fig:The global structure of the system

25

Page 26: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

The content-based retrieval as a process can be divided into two main phases. The

first is the construction phase, in which the data of preprocessed images is stored in the

form of feature vectors – this is the off-line part of the program. This part carries out the

computation intensive tasks, which has to be done before the program actual use. The other

phase is the retrieval process, which is the on-line unit of the program. First the user draws a

sketch or loads an image. When the drawing has been finished or the appropriate

representative has been loaded, the retrieval process is started. The retrieved image first is

preprocessed. After that the feature vector is generated, then using the retrieval subsystem a

search is executed in the previously indexed database. As a result of searching a result set is

raised, which appears in the user interface on a systematic

form. Based on the result set we can again retrieve using another descriptor with different

nature.

Fig: The Data Flow Model Of The System

CBIR is a technology that in principle helps organize digital image archives according to

their visual content. This system distinguishes the different regions present in an image

based on their similarity in color, pattern, texture, shape, etc. and decides the similarity

between two images by reckoning the closeness of these different regions. The CBIR

approach is much closer to how we humans distinguish images. Thus, we overcome the

difficulties present in text-based image retrieval because low-level image features can be

automatically extracted from the images by using CBIR and to some extent they describe

the image in more detail compared to the text-based approach . Image classification or

categorization has often been treated as a preprocessing step for speeding-up image retrieval

in large databases and improving accuracy, or for performing automatic image annotation.

Image clustering inherently depends on a similarity measure, image categorization has been

performed by varied methods that neither require nor make use of similarity metrics. Image

26

Page 27: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

categorization is often followed by a step of similarity measurement, restricted to those

images in a large database that belong to the same visual class as predicted for the query. In

such cases, the retrieval process is intertwined, whereby categorization and similarity

matching steps together form the retrieval process

CBIR Techniques

In contrast to the text-based approach of the systems, CBIR operates on a totally

different principle, retrieving stored images from a collection by comparing features

automatically extracted from the images themselves. The commonest features used are

mathematical measures of color, texture or shape; hence virtually all current CBIR systems,

whether commercial or experimental, operate at level1. A typical system allows 

users to formulate queries by submitting an example of the type of image being sought,

though some offer alternatives such as selection from a palette or sketch input. The system

then identifies those stored images whose feature values match those of the query most

closely, and displays thumbnails of these images on the screen Error! Reference source not

found. Some of the more commonly used types of feature used for image retrieval are

described below.

Color retrieval

Several methods for retrieving images on the basis of color similarity have been

described in the literature, but most are variations on the same basic idea. Each image added

to the collection is analyzed to compute a color histogram which shows the proportion of

pixels of each color within the image. The color histogram for each image is then stored in

the database. At search time, the user can either specify the desired proportion of each color

(75% olive green and 25% red, for example), or submit an example image from which a

color histogram is calculated. Either way, the matching process then retrieves those images

whose color histograms match those of the query most closely. The matching technique

most commonly used, histogram intersection, was first developed by Swain and Ballard

[1991]. Variants of this technique are now used in a high proportion of current CBIR

systems. Methods of improving on Swain and Ballard’s original technique include the use

of cumulative color histograms, combining histogram intersection with some element of

spatial matching, and the use of region-based color querying. The results from some of

these systems can look quite impressive.

27

Page 28: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Texture retrieval

The ability to retrieve images on the basis of texture similarity may not seem very useful.

But the ability to match on texture similarity can often be useful in distinguishing between

areas of images with similar color (such as sky and sea, or leaves and grass). A variety of

techniques has been used for measuring texture similarity; the best-established rely on

comparing values of what are known as second-order statistics calculated from query and

stored images. Essentially, these calculate the relative brightness of selected pairs of pixels

from each image. From these it is possible to calculate measures of image texture such as

the degree of contrast, coarseness, directionality and regularity, or periodicity, directionality

and randomness. Alternative methods of texture analysis for retrieval include the use of

Gabor filters and fractals. Texture queries can be formulated in a similar manner to color

queries, by selecting examples of desired textures from a palette, or by supplying an

example query image. The system then retrieves images with texture measures most similar

in value to the query. A recent extension of the technique is the texture thesaurus, which

retrieves textured regions in images on the basis of similarity to automatically-derived code

words representing important classes of texture within the collection.

Shape retrieval

The ability to retrieve by shape is perhaps the most obvious requirement at the

primitive level. Unlike texture, shape is a fairly well-defined concept – and there is

considerable evidence that natural objects are primarily recognized by their shape. A

number of features characteristic of object shape (but independent of size or orientation) are

computed for every object identified within each stored image. Queries are then answered

by computing the same set of features for the query image, and retrieving those stored

images whose features most closely match those of the query. Two main types of shape

feature are commonly used – global features such as aspect ratio, circularity and moment

invariants and local features such as sets of consecutive boundary segments. Alternative

methods proposed for shape matching have included elastic deformation of templates,

comparison of directional histograms of edges extracted from the image, and shocks,

skeletal representations of object shape that can be compared using graph matching

techniques. Queries to shape retrieval systems are formulated either by identifying an

example image to act as the query, or as a user-drawn sketch.

Shape matching of three-dimensional objects is a more challenging task – particularly

where only a single 2-D view of the object in question is available. While no general

28

Page 29: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

solution to this problem is possible, some useful inroads have been made into the problem

of identifying at least some instances of a given object from different viewpoints. One

approach has been to build up a set of plausible 3-D models from the available 2-D image,

and match them with other models in the database. Another is to generate a series of

alternative 2-D views of each database object, each of which is matched with the query

image. Related research issues in this area include defining 3-D shape similarity measures,

and providing a means for users to formulate 3-D shape queries.

Retrieval by other types of primitive feature

One of the oldest-established means of accessing pictorial data is retrieval by its

position within an image. Accessing data by spatial location is an essential aspect of

geographical information systems, and efficient methods to achieve this have been around

for many years. Similar techniques have been applied to image collections, allowing users

to search for images containing objects in defined spatial relationships with each other.

Improved algorithms for spatial retrieval are still being proposed. Spatial indexing is seldom

useful on its own, though it has proved effective in combination with other cues such as

color and shape.

5.1 .HARDWARE SPECIFICATION

Processor : INTEL PENTIUM IV or above

CPU Speed : 2.79 GHz

Cache memory : 1 MB

RAM : 1 GB

Hard Disk : 40 GB

Drive : CD Reader and Writer

29

Page 30: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Monitor : 17” Color

5.2. ADDITIONAL HARDWARE COMPONENTS USED

5.3. PLATFORM USED

NETBEANS 7.1 IDE

NetBeans IDE is an integrated development environment (IDE) for writing, compiling,

testing, and debugging software applications for the JavaTM platform and other

environments. NetBeans IDE includes a full-featured text editor, visual design tools, source

code management support, database integration tools, and many other features.

5.4. PROGRAMMING LANGUAGES USED

JAVA

Java is a programming language originally developed by James Gosling at Sun

Microsystems and released in 1995 as a core component of Sun Microsystems Java

platform. The language derives much of its syntax from C and C++ but has a simpler object

model and fewer low-level facilities. Java applications are typically compiled to byte

code (class file) that can run on any Java Virtual Machine (JVM) regardless of computer

architecture. Java is a general-purpose, concurrent, class-based, object-oriented language

that is specifically designed to have as few implementation dependencies as possible. It is

intended to let application developers "write once, run anywhere".

The original and reference implementation Java compilers, virtual machines, and class

libraries were developed by Sun from 1995. As of May 2007, in compliance with the

specifications of the Java Community Process, Sun relicensed most of its Java technologies

under the GNU General Public License. Others have also developed alternative

implementations of these Sun technologies, such as the GNU Compiler for Java and GNU

Class path. Some features are :

Platform Independent

The concept of Write-once-run-anywhere (known as the Platform independent) is one of

the important key feature of java language that makes java as the most powerful

language. Not even a single language is idle to this feature but java is more closer to this

30

Page 31: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

feature. The programs written on one platform can run on any platform provided the

platform must have the JVM. 

Simple

There are various features that makes the java as a simple language. Programs are easy

to write and debug because java does not use the pointers explicitly. It is much harder to

write the java programs that can crash the system but we can not say about the other

programming languages. Java provides the bug free system due to the strong memory

management. It also has the automatic memory allocation and deallocation system.

Object Oriented

To be an Object Oriented language, any language must follow at least the four

characteristics.

Inheritance   :   It is the process of creating the new classes and using the behavior

of the existing classes by extending them just to reuse  the existing code and adding

the additional features as needed.

Encapsulation : It is the mechanism of combining the information and providing

the abstraction.

Polymorphism :   As the name suggest one name multiple form, Polymorphism is

the way of providing the different functionality by the functions  having the same

name based on the signatures of the methods.

Dynamic binding  :   Sometimes we don't have the knowledge of objects about their

specific types while writing our code. It is the way     of providing the maximum

functionality to a program about the specific type at runtime.  

As the languages like Objective C, C++ fulfills the above four characteristics yet they  are

not fully object oriented languages because they are structured as well as object oriented

languages. But in case of java,  it is a fully Object Oriented language because object is at the

outer most level of data structure in java. No stand alone methods, constants, and variables

are there in java. Everything in java is object even the primitive data types can also be

converted into object by using the wrapper class.

31

Page 32: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Some important features of Java are :-

Robust

Java has the strong memory allocation and automatic garbage collection mechanism. It

provides the powerful exception handling and type checking mechanism as compare to

other programming languages. Compiler checks the program whether there any error and

interpreter checks any run time error and makes the system secure from crash. All of the

above features makes the java language robust.

Distributed

The widely used protocols like HTTP and FTP are developed in java. Internet programmers

can call functions on these protocols and can get access the files from any remote machine

on the internet rather than writing codes on their local system.

Portable

The feature Write-once-run-anywhere  makes the java language portable provided that the

system must have interpreter for the JVM. Java also have the standard data size irrespective

of operating system or the processor. These features makes the java as a portable language.

Dynamic

While executing the java program the user can get the required files dynamically from a

local drive or from a computer thousands of miles away from the user just by connecting

with the Internet.

Secure

Java does not use memory pointers explicitly. All the programs in java are run under an area

known as the sand box. Security manager determines the accessibility options of a class like

reading and writing a file to the local disk. Java uses the public key encryption system to

allow the java applications to transmit over the internet in the secure encrypted form. The

bytecode Verifier checks the classes after loading. 

Performance

32

Page 33: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Java uses native code usage, and lightweight process called  threads. In the beginning

interpretation of bytecode resulted the performance slow but the advance version of JVM

uses the adaptive and just in time compilation technique that improves the performance. 

Multithreaded

As we all know several features of Java like Secure, Robust, Portable, dynamic etc; you will

be more delighted to know another feature of Java which is Multithreaded.

Java is also a Multithreaded programming language. Multithreading means a single program

having different threads executing independently at the same time. Multiple threads execute

instructions according to the program code in a process or a program. Multithreading works

the similar way as multiple processes run on one computer.  

Multithreading programming is a very interesting concept in Java. In multithreaded

programs not even a single thread disturbs the execution of other thread. Threads are

obtained from the pool of available ready to run threads and they run on the system CPUs.

This is how Multithreading works in Java which you will soon come to know in details in

later chapters.

Interpreted

We all know that Java is an interpreted language as well. With an interpreted language such

as Java, programs run directly from the source code. 

The interpreter program reads the source code and translates it on the fly into computations.

Thus, Java as an interpreted language depends on an interpreter program.

The versatility of being platform independent makes Java to outshine from other

languages. The source code to be written and distributed is platform independent.  

Another advantage of Java as an interpreted language is its error debugging quality. Due to

this any error occurring in the program gets traced. This is how it is different to work with

Java.

Architecture Neutral

The term architectural neutral seems to be weird, but yes Java is an architectural neutral

language as well. The growing popularity of networks makes developers think distributed.

In the world of network it is essential that the applications must be able to migrate easily to

33

Page 34: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

different computer systems. Not only to computer systems but to a wide variety of hardware

architecture and Operating system architectures as well.  The Java compiler does this by

generating byte code instructions, to be easily interpreted on any machine and to be easily

translated into native machine code on the fly. The compiler generates an architecture-

neutral object file format to enable a Java application to execute anywhere on the network

and then the compiled code is executed on many processors, given the presence of the Java

runtime system. Hence Java was designed to support applications on network. This feature

of Java has thrived the programming language.

5.5. SOFTWARE TOOLS USED

Operating System : Windows XP and above

Language : Java

Tool used : NetBeans 7.1 IDE

APPENDIX – B USERS MANUAL

34

Page 35: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

The main objective of the work is to secure the image which is being transmitted. The

image is wavelet transformed using Daubechies scaling coefficients and encrypted using

Haar algorithm.It is the simplest and crudest algorithm for image compression. As

compared to other algorithms, it is more effective. The quality of compressed image is also

maintained. SPIHT algorithm is used to perform compression and encryption of the image.

After the encryption phase, the encrypted image is sent through the socket to the Decoder

Side which is recipient. In the decoder side the image is reconstructed using inverse wavelet

and Haar algorithm.

The CBIR project includes four modules-

Add Image module.

Indexing module.

Searching module.

Displaying the resultant images.

Add Image

In this module, the user can upload images .The file chooser only allows the user to upload

images with jpg and png extensions. The uploaded image set into an image icon on the

software platform. . The main function performed here is query image uploading.

Indexing Module

A number of indexing schemes use classification codes rather than keywords or subject

descriptors to describe image content, as these can give a greater degree of language

independence and show concept hierarchies more clearly. Indexing consists of numbering

of images in the selected directory and extracting its features, and then stored in an array

list. A document builder is created according to the extracted features of images.

. The indexing phase includes:

Preprocessing: The image is first processed in order to extract the features, which describe

its contents. The processing involves filtering, normalization, segmentation, and object

identification. The output of this stage is a set of significant regions and objects.

35

Page 36: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Feature extraction: Features such as shape, texture, color, etc. are used to describe the

content of the image. Image features can be classified into primitives. The feature is defined

as a function of one or more measurements, each of which specifies some quantifiable

property of an object, and is computed such that it quantifies some significant

characteristics of the object. Features extracted are:

• General features: Application independent features suchas color, texture, and shape.

According to the abstractionlevel, they can be further divided into: Pixel-level features,

Local features and Global features.

• Domain-specific features: Application dependent features such as human faces,

fingerprints, and conceptual features. These features are often a synthesis of low-level

features for a specific domain.

In content-based image retrieval, images are automatically indexed by generating a

feature vector (stored as an index in feature databases) describing the content of the image.

The similarity of the feature vectors of the query and directory images is measured to

retrieve the image. The directory containing images are indexed before searching inorder to

make the comparison easier and faster. Low-level features can be extracted directed from

the original images. Cbir uses the low level features extracted.

Searching Module

After indexing is performed, searching begins by uploading the image for searching. This

image is preprocessed and feature vector for the same is generated. Next the user need to

select algorithms for searching. The used algorithms in this project are

CEDD

This algorithm deals with a new low level feature that is extracted from the images and

can be used for indexing and retrieval. This feature is called “Color and Edge Directivity

36

Page 37: Interest in the Potential of Digital Images Has Increased Enormously Over the Last Few Years

Descriptor” and incorporates color and texture information in a histogram. CEDD size is

limited to 54 bytes per image, rendering this descriptor suitable for use in large image

databases. One of the most important attribute of the CEDD is the low computational power

needed for its extraction, in comparison with the needs of the most MPEG-7 descriptors

Displaying of signal

Because drawings are the basis of the retrieval, thus a drawing surface is provided,

where they can be produced. Also a folder containing collection of images is needed for

search, which also must be set before the search. In case of large result set the systematic

arrangement of search results makes much easier the overviews, so it is guaranteed. The

methods in our system cannot work without parameters, and therefore an opportunity is

provided to set these as well.

The number of results to show in the user interface is an important aspect. Prima

facie the first n pieces of results can be displayed, which conveniently can be placed in the

user interface. This number depends on the resolution of the monitor, and the number of

results entered by the user. In our system the possible results are classified, and the

obtained clusters are displayed. Hence the solution set is more ordered and transparent. By

default the results are displayed by relevance, but false-positive results can be occurred,

which worsen the retrieval results. If the results are reclassified in according to some

criterion, then the number of false-positive results decreases. Thus the user perception is

better. Since the color-based clustering for us is the best solution, so our choice

was the k-means clustering method ,which is perfectly suited for this purpose.

The search results according to their priority, the images are displayed in a tabular format,

with its location detail. The images are displayed in the order of similarity with the query

image. Highest similar image is displayed first. The number of search results can be

assigned by the user.

37