projectreport ocrrecognition 140903052518 phpapp02

95
7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02 http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 1/95 1. INTRODUCTION In the running world, there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. These days there is a huge demand in “storing the information available in these paper documents in to a computer storage disk and then later reusing this information by searching process”. ne simple way to store information in these paper documents in to computer system is to first scan the documents and then store them as I!"#$%. &ut to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line'by'line and word'by'word. The reason for this difficulty is the font characteristics of the characters in paper documents are different to font of the characters in computer system. "s a result, computer is unable to recognize the characters while reading them. This concept of storing the contents of paper documents in computer storage place and then reading and searching the content is called ()*!$+T -)$%%I+#. %ometimes in this document processing we need to process the information that is related to languages other than the $nglish in the world. or this document processing we need a software system called CHARACTER RECOGNITION SYSTEM. This process is also called ()*!$+T I!"#$ "+"/0%I% 1(I"2. Thus our need is to develop character recognition software system to perform (ocument Image "nalysis which transforms documents in paper format to electronic format. or this  process there are various techni3ues in the world. "mong all those techni3ues we have 1

Upload: ramya-manohari

Post on 18-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 1/95

1. INTRODUCTION

In the running world, there is growing demand for the software systems to recognize

characters in computer system when information is scanned through paper documents as we

know that we have number of newspapers and books which are in printed format related to

different subjects. These days there is a huge demand in “storing the information available

in these paper documents in to a computer storage disk and then later reusing this

information by searching process”. ne simple way to store information in these paper 

documents in to computer system is to first scan the documents and then store them as

I!"#$%. &ut to reuse this information it is very difficult to read the individual contents

and searching the contents form these documents line'by'line and word'by'word. The

reason for this difficulty is the font characteristics of the characters in paper documents are

different to font of the characters in computer system. "s a result, computer is unable to

recognize the characters while reading them. This concept of storing the contents of paper 

documents in computer storage place and then reading and searching the content is called

()*!$+T -)$%%I+#. %ometimes in this document processing we need to process

the information that is related to languages other than the $nglish in the world. or this

document processing we need a software system called CHARACTER RECOGNITION

SYSTEM. This process is also called ()*!$+T I!"#$ "+"/0%I% 1(I"2.

Thus our need is to develop character recognition software system to perform (ocument

Image "nalysis which transforms documents in paper format to electronic format. or this

 process there are various techni3ues in the world. "mong all those techni3ues we have

1

Page 2: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 2/95

chosen ptical )haracter -ecognition as main fundamental techni3ue to recognize

characters. The conversion of paper documents in to electronic format is an on'going task in

many of the organizations particularly in -esearch and (evelopment 1-4(2 area, in large

 business enterprises, in government institutions, so on. rom our problem statement we can

introduce the necessity of ptical )haracter -ecognition in mobile electronic devices such

as cell phones, digital cameras to ac3uire images and recognize them as a part of face

recognition and validation.

To effectively use ptical )haracter -ecognition for character recognition in'order to

 perform (ocument Image "nalysis 1(I"2, we are using the information in #rid format. .

This system is thus effective and useful in Virtual Digital Library’s  design and

construction.

1.1 PURPOSE

The main purpose of Optical Character Recognition OCR! system based on a grid

infrastructure is to perform (ocument Image "nalysis, document processing of electronic

document formats converted from paper formats more effectively and efficiently. This

improves the accuracy of recognizing the characters during document processing compared

to various e5isting available character recognition methods. 6ere )- techni3ue derives

the meaning of the characters, their font properties from their bit'mapped images.

The primary objective is to speed up the process of character recognition in document

 processing. "s a result the system can process huge number of documents with'in less

time and hence saves the time.

2

Page 3: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 3/95

%ince our character recognition is based on a grid infrastructure, it aims to recognize

multiple heterogeneous characters that belong to different universal languages with

different font properties and alignments.

1."  PRO#ECT SCOPE

The scope of our product Optical Character Recognition on a grid infrastructure is to

provide an efficient and enhanced software tool for the users to perform Document Image

Analysis, document processing by reading and recognizing the characters in research,

academic, governmental and business organizations that are having large pool of 

documented, scanned images. Irrespective of the size of documents and the type of 

characters in documents, the product is recognizing them, searching them and processing

them faster according to the needs of the environment.

1.$ E%ISTING SYSTEM

In the running world there is a growing demand for the users to convert the printed

documents in to electronic documents for maintaining the security of their data. 6ence the

 basic )- system was invented to convert the data available on papers in to computer 

 process able documents, %o that the documents can be editable and reusable. The e5isting

system7the previous system of )- on a grid infrastructure is just )- without grid

functionality. That is the e5isting system deals with the homogeneous character recognition

or character recognition of single languages.

3

Page 4: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 4/95

1.& DRA'(AC) O* E%ISTING SYSTEM

The drawback in the early )- systems is that they only have the capability to convert and

recognize only the documents of $nglish or a specific language only. That is, the older )- 

system is uni'lingual.

1.+ PROPOSED SYSTEM

ur proposed system is )- on a grid infrastructure which is a character recognition

system that supports recognition of the characters of multiple languages. This feature is

what we call grid infrastructure which eliminates the problem of heterogeneous character 

recognition and supports multiple functionalities to be performed on the document. The

multiple functionalities include editing and searching too where as the e5isting system

supports only editing of the document. In this conte5t, #rid infrastructure means the

infrastructure that supports group of specific set of languages. Thus )- on a grid

infrastructure is multi'lingual.

1., (ENE*IT O* PROPOSED SYSTEM

The benefit of proposed system that overcomes the drawback of the e5isting system is that

it supports multiple functionalities such as editing and searching. It also adds benefit by

 providing heterogeneous characters recognition.

4

Page 5: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 5/95

Document

Illuminator

Detector

Document

Analysis CharacterRecognition Contextual

Processing

Scanner

OCR Hard-Ware Or Soft-Ware

Document image

Recognition Results

 To application user

1.- ARCHITECTURE O* THE PROPOSED SYSTEM

  The "rchitecture of the optical character recognition system on a grid infrastructure

consists of the three main components. They are8'

%canner 

)- 6ardware or %oftware

utput Interface

 

*igre.1/ OCR Architectre

5

Page 6: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 6/95

1.0 INTENDED AUDIENCE AND READING SUGGESTIONS

In this section, we identify the audience who are interested with the product and are

involved in the implementation of the product either directly or indirectly. "s from our 

research, the )- system is mainly useful in -4( at various scientific organizations, in

governmental institutes and in large business organizations, we identify the following as

various interested audience in implementing )- system8'

The scientists, the research scholars and the research fellows in telecommunication

institutions are interested in using )- system for processing the word document

that contains base paper for their research.

The /ibrarian to manage the information contents of the older books in building

virtual digital library re3uires use of )- system.

9arious sites that vendor e'books have a huge re3uirement of this )- system in'

order to scan all the books in to electronic format and thus make money. The

"mazon book world is largely using this concept to build their digital libraries.

 +ow we present the reading suggestions for the users or clients through which the user can

 better understand the various phases of the product. These suggestions may be effective and

useful for the beginners of the product rather than the regular users such as research

scholars, librarians and administrators of various web'sites. :ith these suggestions, the user 

need not waste his time in scrolling the documents up and down, browsing through the web,

visiting libraries in search of different books and ; The following are the various reading

suggestions that the user can follow in'order to completely understand about our product

and to save time8'

6

Page 7: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 7/95

It would help you if you start with :ikipedia.com. It lets you know the basic

concept of every keyword you re3uire. irst learn from it what is )-< "nd how

does it work based on a #rid infrastructure<

 +ow you can proceed your further reading with the introduction of our product we

 provided in our documentation. rom these two steps you completely get an in'

depth idea of the use of our product and several processes involved in it.

The more you need is the implementation of the product. or this you can visit

ree)-.com where you can view how the sample )- works and you can try it.

7

Page 8: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 8/95

".  FEASIBILITY STUDY

" feasibility study is a high'level capsule version of the entire %ystem analysis and (esign

rocess. The study begins by classifying the problem definition. easibility is to determine

if it=s worth doing. nce an acceptance problem definition has been generated, the analyst

develops a logical model of the system. " search for alternatives is analyzed carefully.

There are > parts in feasibility study.

".1 TECHNICA *EASI(IITY

$valuating the technical feasibility is the trickiest part of a feasibility study. This is because,

at this point in time, not too many detailed design of the system, making it difficult to

access issues like performance, costs on 1on account of the kind of technology to be

deployed2 etc. " number of issues have to be considered while doing a technical analysis.

*nderstand the different technologies involved in the proposed system before commencing

the project we have to be very clear about what are the technologies that are to be re3uired

for the development of the new system. ind out whether the organization currently

 possesses the re3uired technologies. Is the re3uired technology available with the

organization<.

"." OPERATIONA *EASI(IITY

Page 9: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 9/95

roposed project is beneficial only if it can be turned into information systems that will

meet the organizations operating re3uirements. %imply stated, this test of feasibility asks

if the system will work when it is developed and installed. "re there major barriers to

Implementation< 6ere are 3uestions that will help test the operational feasibility of a

 project8

Is there sufficient support for the project from management from users< If the current

system is well liked and used to the e5tent that persons will not be able to see reasons

for change, there may be resistance.

"re the current business methods acceptable to the user< If they are not, *sers may

welcome a change that will bring about a more operational and useful systems.

6ave the user been involved in the planning and development of the project<

$arly involvement reduces the chances of resistance to the system and in general and

increases the likelihood of successful project.

%ince the proposed system was to help reduce the hardships encountered. In the e5isting

manual system, the new system was considered to be operational feasible.

".$ ECONOMIC *EASI(IITY

 $conomic feasibility attempts to weigh the costs of developing and implementing a new

system, against the benefits that would accrue from having the new system in place. This

feasibility study gives the top management the economic justification for the new system. "

simple economic analysis which gives the actual comparison of costs and benefits are much

!

Page 10: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 10/95

more meaningful in this case. In addition, this proves to be a useful point of reference to

compare actual costs as the project progresses. There could be various types of intangible

 benefits on account of automation. These could include increased customer satisfaction,

improvement in product 3uality better decision making timeliness of information,

e5pediting activities, improved accuracy of operations, better documentation and record

keeping, faster retrieval of information, better employee morale.

".& TRAINING

Training is a very important process of working with a neural network. "s seen from neural

networks, there are two forms of training that can be employed with a neural network. They

are namely8'

?. *n'%upervised Training

@. %upervised Training

%upervised training provides the neural network with training sets and the anticipated

output. *nsupervised training supplies the neural network with training sets, but there is no

anticipated output provided.

".&.1 UNSUPER2ISED TRAINING

*nsupervised training is a very common training techni3ue for Aohonen neural networks.

:e will discuss how to construct a Aohonen neural network and the general process for 

training without supervision.

1"

Page 11: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 11/95

:hat is meant by training without supervision is that the neural network is provided with

training sets, which are collections of defined input values. &ut the unsupervised neural

network is not provided with anticipated outputs.

*nsupervised training is usually used in a classification neural network. " classification

neural network takes input patterns, which are presented to the input neurons. These input

 patterns are then processed, and one single neuron on the output layer fires. This firing

neuron can be thought of as the classification of which group the neural input pattern

 belonged to. 6andwriting recognition is a good application of a classification neural

network.

 The input patterns presented to the Aohonen neural network are the dot image of the

character that was hand written. :e may then have @B output neurons, which correspond to

the @B letters of the $nglish alphabet. The Aohonen neural network should classify the input

 pattern into one of the @B input patterns.

(uring the training process the Aohonen neural network in handwritten recognition is

 presented with @B input patterns. The network is configured to also have @B output patterns.

"s the Aohonen neural network is trained the weights should be adjusted so that the input

 patterns are classified into the @B output neurons. This techni3ue results in a relatively

effective method for character recognition.

 "nother common application for unsupervised training is data mining. In this case you

have a large amount of data, but you do not often know e5actly what you are looking for.

0ou want the neural network to classify this data into several groups. 0ou do not want to

11

Page 12: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 12/95

dictate, ahead of time, to the neural network which input pattern should be classified to

which group. "s the neural network trains the input patterns will fall into similar groups.

This will allow you to see which input patterns were in common groups.

".&." SUPER2ISED TRAINING

The supervised training method is similar to the unsupervised training method in that

training sets are provided. Cust as with unsupervised training these training sets specify

input signals to the neural network.

The primary difference between supervised and unsupervised training is that in supervised

training the e5pected outputs are provided. This allows the supervised training algorithm to

adjust the weight matri5 based on the difference between the anticipated output of the

neural network, and the actual output.

There are several popular training algorithms that make use of supervised training. ne of 

the most common is the back'propagation algorithm. It is also possible to use an algorithm

such as simulated annealing or a genetic algorithm to implement supervised training

".+ INTRODUCING )OHONEN NEURA NET'OR) 

The Aohonen neural network differs considerably from the feed'forward back propagation

neural network. The Aohonen neural network differs both in how it is trained and how it

12

Page 13: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 13/95

recalls a pattern. The Aohonen neural network does not use any sort of activation function.

urther, the Aohonen neural network does not use any sort of a bias weight.

utput from the Aohonen neural network does not consist of the output of several neurons.

:hen a pattern is presented to a Aohonen network one of the output neurons is selected as

a DwinnerD. This DwinningD neuron is the output from the Aohonen network. ften these

DwinningD neurons represent groups in the data that is presented to the Aohonen network.

or e5ample, in an )- program that uses @B output neurons, the @B output neurons map

the input patterns into the @B letters of the /atin alphabet.

The most significant difference between the Aohonen neural network and the feed forward

 back propagation neural network is that the Aohonen network trained in an unsupervised

mode. This means that the Aohonen network is presented with data, but the correct output

that corresponds to that data is not specified. *sing the Aohonen network this data can be

classified into groups. :e will begin our review of the Aohonen network by e5amining the

training process.

It is also important to understand the limitations of the Aohonen neural network. +eural

networks with only two layers can only be applied to linearly separable problems. This is

the case with the Aohonen neural network. Aohonen neural networks are used because they

are a relatively simple network to construct that can be trained very rapidly.

" Dfeed forwardD neural network is similar to the types of neural networks that we are ready

e5amined. Cust like many other neural network types the feed forward neural network 

 begins with an input layer. This input layer must be connected to a hidden layer. This

13

Page 14: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 14/95

hidden layer can then be connected to another hidden layer or directly to the output layer.

There can be any number of hidden layers so long as at least one hidden layer is provided.

In common use most neural networks will have only one hidden layer. It is very rare for a

neural network to have more than two hidden layers. :e will now e5amine, in detail, and

the structure of a Dfeed forward neural networkD.

The Strctre o3 a *ee4 *or5ar4 Neral Net5or6 

  " Dfeed forwardD neural network differs from the neural networks previously e5amined.

*igre ".1 7ho57 a t8pical 3ee4 3or5ar4 neral net5or6 5ith a 7ingle hi44en la8er.

 

*igre " *ee4 *or5ar4 Neral Net5or6 

The Inpt a8er

14

Page 15: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 15/95

The input layer to the neural network is the conduct through which the e5ternal

environment presents a pattern to the neural network. nce a pattern is presented to the

input layer of the neural network the output layer will produce another pattern. In essence

this is all the neural network does. The input layer should represent the condition for which

we are training the neural network for. $very input neuron should represent some

independent variable that has an influence over the output of the neural network.

It is important to remember that the inputs to the neural network are floating point numbers.

These values are e5pressed as the primitive Cava data type DdoubleD. This is not to say that

you can only process numeric data with the neural network. If you wish to process a form

of data that is non'numeric you must develop a process that normalizes this data to a

numeric representation.

The Otpt a8er

The output layer of the neural network is what actually presents a pattern to the e5ternal

environment. :hatever patter is presented by the output layer can be directly traced back to

the input layer. The number of a output neurons should directly related to the type of work 

that the neural network is to perform.

To consider the number of neurons to use in your output layer you must consider the

intended use of the neural network. If the neural network is to be used to classify items into

groups, then it is often preferable to have one output neurons for each groups that the item

is to be assigned into. If the neural network is to perform noise reduction on a signal then it

is likely that the number of input neurons will match the number of output neurons. In this

15

Page 16: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 16/95

sort of neural network you would one day he would want the patterns to leave the neural

network in the same format as they entered.

or a specific e5ample of how to choose the numbers of input and output neurons consider 

a program that is used for optical character recognition, or )-. To determine the number 

of neurons used for the )- e5ample we will first consider the input layer. The number of 

input neurons that we will use is the number of pi5els that might represent any given

character. )haracters processed by this program are normalized to universal size that is

represented by a E5F grid. " E5F grid contains a total of >E pi5els. The optical character 

recognition program therefore has >E input neurons.

The number of output neurons used by the )- program will vary depending on how many

characters the program has been trained for. The default training file that is provided with

the optical character recognition program is trained to recognize @B characters. "s a result

using this file the neural network would have @B output neurons. resenting a pattern to the

input neurons will fire the appropriate output neuron that corresponds to the letter that the

input pattern corresponds to.

16

Page 17: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 17/95

$.SO*T'ARE RE9UIREMENT ANAYSIS

$.1 PRO(EM STATEMENT

The problem here is for the software systems to recognize characters in computer system

when information is scanned through paper documents as we know that we have number of 

newspapers and books which are in printed format related to different subjects. :henever 

we scan the documents through the scanner, the documents are stored as  images such as

 jpeg, gif etc., in the computer system. These images cannot be read or edited by the user.

&ut to reuse this information it is very difficult to read the individual contents and searching

the contents form these documents line'by'line and word'by'word. These days there is a

huge demand in “storing the information available in these paper documents in to a

computer storage disk and then later editing or reusing this information by searching

 process”.

$." MODUES AND THEIR *UNCTIONAITIES

ur software system ptical )haracter -ecognition on a grid infrastructure can be divided

into five modules based on its functionality.The modules classified are as follows8'

(ocument rocessing !odule

17

Page 18: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 18/95

%ystem Training !odule.

(ocument -ecognition !odule.

(ocument $diting !odule and

(ocument %earching !odule.

$.".1 DOCUMENT PROCESSING MODUE

This module is accessed by administrator whose role in our application is a librarian.This

module perform certain activities such as scanning documents, storing them as images,

recognizing characters in images to transfer them into word format. (uring the recognition

 process, this module uses the )- methodology in support of grid infrastructure

datastructure. The module supports the following services8'

%canning printed documents.

%toring the documents as snapshots or images.

rocessing those image'based documents.

)onverting these image'based documents into e'documents1also called structured

documents2.

-ecognizing the characters in documents.

#enerating grid infrastructure datastructure.

$."." SYSTEM TRAINING MODUE

This module can be accessed by both the administrator and the end'user. &efore converting

the printed documents in to editable and searchable documents, the first and the mandatory

step is providing training to the system. 6ere training in the sense the font followed in the

1

Page 19: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 19/95

scanned document should be identified by the user. Then the user types all the characters

that are re3uired for recognition from the scanned document as an image file. This image

file should be provided as an input during the training process. The user then clicks the train

 button provided in the recognition module. Then the training gets completed. Thus the

system gets familiar with the new font. This module supports8'

Training the system with the pre'defined fonts.

Training the system with the new fonts that are not present in the system and that

cannot be identified by the system.

$.".$ DOCUMENT RECOGNITION MODUE

This module can be accessed by both the administrator and the end'user. nce the printed

documents are converted into structured documents, any user can recognize the characters

 present in the document. That means the user can recognize the characters of any language

he chooses which makes )- more fle5ible. This fle5ibility is due to the adaptation of grid

infrastructure. This is the module where the main functionality of )- is tested.

*nder this module, there are two types of recognition. They are handwritten recogniiton

and scanned document recognition. 

In handwritten recognition, the handwriting of the user in any language is trained to the

system only for the first time. rom there on'wards, the system recognizes the characters or 

words written by the user. Thus handwritten document recognition recognizes the human

handwriting. 

In scanned document recognition, the system is first trained with the font characters in the

document in the training module itself. +ow in the recognition module, the system takes the

1!

Page 20: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 20/95

scanned documents image as an input file, first crops the image and then

e5tracts7recognizes the characters from the document and makes these documents editable

and searchable. Thus the scanned document recognition recognizes the chracters from the

scanned document image and makes the document editable and searchable. 6ence the

document recogniiton module on a whole supports the following services8'

)onverts the document into specific format

-ecognizes the characters

6eterogeneous character -ecognition

$.".& DOCUMENT EDITING MODUE

This module can be accessed by both the administrator and the end'user during document

editing to implement the character recogniiton process. nce the scanned documents are

stored, they reside in computer memory. This data resides in the form of an image that is

 just viewable in an image viewer. 6ence, the document is first coverted into a form such

that it is editable. The desired form of the document may be !%':ord,Te5t,; as specified

 by the user.The objective of this module is to let the user perform 8'

"ddition of specific content to the documents

(eletion of certain content from documents

"ny other modification of documents.

$.".+ DOCUMENT SEARCHING MODUE

This module can be accessed by both the administrator and the end'user during the search

of the user re3uired document to implement the character recogniiton process on it. The

2"

Page 21: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 21/95

user re3uests the system to search for a particular document. Then the system finds the

documents based on )- methodology and returns the result of the search to the user.

&. SO*T'ARE DESIGN

4.1 DATA FLOW DIAGRAM

The (( is also called as bubble chart. " 4ata:3lo5 4iagra;  1D*D2 is a graphical

representation of the DflowD of data through an information system. ((=s can also be used

for the visualization of data processing. The flow of data in our system can be described in

the form of dataflow diagram as follows8'

?. irstly, if the user is administrator he can initialize the following actions8'

(ocument processing

(ocument search

(ocument editing.

"ll the above actions come under @cases.They are described as follows8'

a2 If the printed document is a new document that is not yet read into the system, then the

document processing phase reads the scanned document as an image only and then

 produces the document image stored in computer memory as a result.

21

Page 22: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 22/95

 +ow the document processing phase has the document at its hand and can read the

document at any point of time. /ater the document processing phase proceeds with

recognizing the document using )- methodology and the grid infrastructures. Thus it

 produces the documents with the recognized characters as final output which can be

later searched and edited by the end'user or administrator.

 b2 If the printed document is already scanned in and is held in system memory, then the

document processing phase proceeds with document recognition using )- 

methodology and grid infrastructure. "nd thus it finally produces the document with

recognized documents as output.

@. If the user using the )- system is the end'user, then he can perform the following

actions8'

(ocument searching

(ocument editing

1. Doc;ent Searching/: The documents which are recognized can be searched by the

user whenever re3uired by re3uesting from the system database.

". Doc;ent E4iting/: The recognized documents can be edited by adding the specific

content to the document, deleting specific content from the document and modifying the

document.

22

Page 23: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 23/95

 

23

(ocumenas image

*ser 

-ecognize(ocument

-ead

documents

%tore

images

%canimages

!odify

*se#rid

*se

)- (elete

-ecognize

(ocument

(ocumentrocessing

$dit (ocument(ocument

%earch

Page 24: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 24/95

*igre $/Data *lo5 Diagra;

&." UM DIAGRAMS

  *!/ combines best techni3ues from data modeling 1entity relationship diagrams2,

 business modeling 1work flows2, object modeling, and component modeling. It can be used

with all processes, throughout the software development life cycle, and across different

implementation technologies. *!/ has ?G types of diagrams divided into two categories.

%even diagram types represent structural information, and the other seven represent general

types of behavior, including four that represent different aspects of interactions. %ome of 

these diagrams we provided to describe the design and implementation of our )- system

can be categorized hierarchically as below8'

*se case diagram

)lass diagram

%e3uence diagram

)ollaboration diagram

"ctivity diagram

24

Page 25: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 25/95

)omponent diagram

(eployment diagram

&.".1 USE:CASE DIAGRAMS

ur software system can be used to support library environment to create a  Digital Library

where several paper documents are converted into electronic'form for accessing by the

users. or this purpose the printed documents must be recognized before they are converted

into electronic'form. The resulting electronic'documents are accessed by the users like

faculty and students for reading and editing. +ow according to this information, the

following are the different actors involved in implementing our )- system8'

If we consider for virtual digital library, the "dministrator can be the /ibrarian and

the $nd'users can be %tudents or7and aculty.

The following are the list of use diagrams that altogether form the complete or the

overall use'case diagram. They are listed below8'

?. *se'case diagram for document processing

@. *se'case diagram for neural network training

>. *se'case diagram for document recognition

G. *se'case diagram for document editing

E. *se'case diagram for document searching

25

Page 26: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 26/95

  In each of the use'case diagrams below we clearly e5plained about that particular use'

case functionality. In this we provided a description about the

*se'case name

(etails about the use'case

 "ctors using this use'case

The flow of events carried out by the use'case

The conditions that occur in this use'case

26

Page 27: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 27/95

Scans documents

read images

stores the images

 Administrator 

*igre & /U7e:Ca7e Diagra; *or Doc;ent Proce77ing

U7e Ca7e Na;e

 (ocument processing

De7cription

27

Page 28: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 28/95

The administrator is the only person who participates in the document processing. 6ere he

scans the documents. The scanned documents are read as images. inally the read images

are stored in the system memory.

Actor7

Pri;ar8 Actor / "dministrator 

Secon4ar8 Actor / *ser 

*lo5 o3 E<ent7

1. The "dministrator scans the document which he wants to edit.

". The scanned documents are read as images.

$. inally the images that are read are stored in system memory for the future

reference.

2

Page 29: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 29/95

Enters specific characters

Stores them as image file

Trains the system

 Administrator or

end-user 

*igre +/U7e:Ca7e Diagra; *or Neral Net5or6 Training

U7e ca7e Na;e

 +eural +etwork Training

De7cription

2!

Page 30: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 30/95

The "dministrator or $nd'user enters the specific characters re3uired for training. *ser

stores them as image file and trains the system.

Actor7

Pri;ar8 Actor / "dministrator or $nd'user 

Secon4ar8 Actor  / *ser 

*lo5 o3 E<ent7

1. The user enters the specific characters in order to train the system.

". "fter entering it is stored as image file.

$. inally trains the system according to the system.

Pre:Con4ition

The font in the scanned document should be identified.

3"

Page 31: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 31/95

Open document in editor 

Select Edit action

Performs editing

Stores edited document

 Administrator orEnd-user 

*igre , U7e:Ca7e Diagra; *or Doc;ent E4iting

U7e ca7e Na;e 

(ocument editing

De7cription 

31

Page 32: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 32/95

&oth "dministrator and $nd'user can perform the document editing. The user opens the

document in the editor and selects the edit action i.e., edit, modify, delete etc. "fter 

selecting the edit action editing operation is performed and finally stores the document that

had been edited.

Actor7

Pri;ar8 Actor / "dministrator or $nd'user 

Secon4ar8 Actor  / *ser 

*lo5 o3 E<ent7

1. The "dministrator or $nd'user opens the document which he want to edit.

". 6e selects the edit action. The action consists of editing the document, modifying

the document, deleting the document etc.

$. "fter selecting the edit action the editing operation is performed.

&. inally the edited document is stored in the system memory.

Pre:Con4ition 

The input to be taken for editing should be an image of the document that is converted in to

word or te5t file. That is the input file must be either .doc file or .t5t file only.

Po7t:Con4ition

32

Page 33: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 33/95

inally after editing the document there are specific target formats defined by the user. The

document should be saved in that format only. That will be the output of the editor. That is,

as per our design the final document after editing must be saved in .doc file or .t5t file only.

Trains System

 Recognize characters

 Administrator or

end-user 

*igre -/U7e:Ca7e Diagra; *or Doc;ent Recognition

U7e ca7e Na;e

(ocument -ecognition

De7cription 

The "dministrator or $nd'user trains the system according to the given symbols or

alphabets. Then the characters are recognized after the system is trained.

33

Page 34: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 34/95

Actor7

Pri;ar8 Actor /  "dministrator or $nd'user 

Secon4ar8 Actor /  *ser 

*lo5 o3 E<ent7

1. The user trains the system to recognize the characters.

". "fter the system is trained the characters are recognized.

Pre:Con4ition

&efore trying to recognize the characters, the system should be trained first with the font

characteristics and the font size.

34

Page 35: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 35/95

 

Opens document in Editor 

Enters word for search

searches the word

 Administrator orend-user 

 

*igre 0/U7e:Ca7e Diagra; *or Doc;ent 7earching

U7e ca7e Na;e

 (ocument %earching

De7cription 

The "dministrator or $nd'user opens the document in editor. 6e enters the word which he

is looking for in that document. Then he searches the word.

Actor7

35

Page 36: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 36/95

Pri;ar8 Actor  / "dministrator or $nd'user 

Secon4ar8 Actor / *ser 

*lo5 o3 E<ent7

1. The user opens the document for searching a word he re3uired.

". "fter opening the document he enters the word for search.

$.  inally searches the word in that document.

Pre An4 Po7t Con4ition7

 +o pre'condition and post'condition

36

Page 37: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 37/95

O<erall U7e:Ca7e Diagra;

end-user1end-user2

ocument modification ocument delet ion

ocument recognition

scan documents

store documents

ocument processing

!!includes""

!!includes""

ocument processing

ocument editing

administrator 

Trains the system

end-user 

*igre =

37

Page 38: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 38/95

&."." CASS DIAGRAMS

The class diagram is the main building block in object oriented modeling. The classes in a

class diagram represent both the main objects and or interactions in the application and the

objects to be programmed.

The class diagram of our )- system consists of Hclasses. They are

?. !ain%creen

@. $ditor  

>. 6elprame

G. (ocument

E. 6$ntry

B. $ntry

F. Training%et

. Aohonen+etwork 

H. rintedrame.

"mong all these classes the !ain%creen is the main class that represents all the major 

functions carried out by our )- system. The !ain%creen class has an association with

five classes viz., $ditor, 6elprame, (ocument, Training%et, rintedrame. "nd the

3

Page 39: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 39/95

Training%et class in'turn has an association with the 6$ntry and the Aohonen+etwork 

classes. The rintedrame has an association with the $ntry and Aohonen+etwork classes.

ocument

docid # integer 

docname # String

docsize # integer 

doctype # String

getocumentetails$%

scanocument$%

co&ertTo'mage$%

store'mage$%

Editor 

cut$%

copy$%

paste$%

new$%

open$%

find$%

(elp)rame

(Entry

h*ine+lear$%

&*ine+lear$%

find,ounds$%

TrainingSetinput+ount # int

outputcount # int

trainingSet+ount # int

set'nput+ount$%

setOutput+ount$%

setTrainingSet+ount$%

set+lassify$%

1--.

1

1--.

1

/ainScreen

editor$%

help)rame$%

printed)rame$%

hand0ritten)rame$%

Entry

recog # int

downSample*eft # int

downSampleRight # int

downSampleTop # int

downSample,ottom # int

h*ine+lear$%

h*ine+lear0ithin$%

&*ine+lear$%

&*ine+lear0ithin$%

Printed)rame

openaction$%

trainaction$%

topenaction$%

recogniseAllaction$%

1--.

1

1--.

1

ohen3etwor4*earn/ethod 5 1#int

*earnRate 5 6-7#dou8le

9uitError # dou8le

copy0eights$%

clear0eights$%

winner$%

normalize'nput$%

1--.1--. 1--.1--. 1--.1--. 1--.1--.

*igre 1>/Cla77 Diagra;

&.".$ SE9UENCE DIAGRAMS

  %e3uence diagrams are sometimes called $vent'trace diagrams, event scenarios, and

timing diagrams. " se3uence diagram shows, as parallel vertical lines 1lifelines2, different

3!

Page 40: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 40/95

 processes or objects that live simultaneously, and, as horizontal arrows, the messages

e5changed between them, in the order in which they occur. This allows the specification of 

simple runtime scenarios in a graphical manner.

  In se3uence diagram, the class objects that are used to describe the interaction between

various classes vary from one function to another function. There are five se3uence

diagrams short'listed below for presenting the se3uence of actions performed by each of the

five modules. The key class object involved in all of these module functions is !ain%creen

class which controls the interaction among various class objects.

Se?ence Diagra; 3or Doc;ent Proce77ing

1.  O@ect7

"dministrator ' “a”

!ain%creen ' “m”

(ocument ' “d”

%ystem!emory ' “s”

".  in67

?. "dministrator object to !ain%creen object.

@. !ain%creen object to (ocument object.

>. (ocument object to %ystem!emory object.

4"

Page 41: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 41/95

G. %ystem!emory object to "dministrator object.

$. Me77age7

?. rocess documents

@. %can documents

>. %cans

G. %tores documents

E. %tores

B. -eturns the processed documents

a#Administraror m#/ainSreen d#ocument s#System/emory

1-Process documents

2-Scan documents

7-Scans

:-Stores documents

;-Stores

<-Returns the processed documents

  *igre 11/Se?ence Diagra; 3or Proce77ing

41

Page 42: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 42/95

Se?ence Diagra; 3or S87te; Training

1.  O@ect7

"dministrator ' “a”

%ystem ' “s”

Training%et ' “t”

".  in67

1. "dministrator object to %ystem object

". %ystem object to Training%et object

$. Training%et object to %ystem object

&. %ystem object to "dministrator object

$.  Me77age7

1. %pecifies the font characters

". %tores it as an image

$. Trains the system with new font

&. %ystem recognizes new font and returns for user 

42

Page 43: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 43/95

a#Administrator s#System t#TrainingSet

1Specifies the font characters

2Stores it as an image

7Trains the system with new font

:System recognizes new font and returns for user 

*igre 1"/Se?ence Diagra; 3or Training

Se?ence Diagra; 3or Doc;ent Recognition

1.  O@ect7

"dministrator ' “a”

!ain%creen ' “m”

%ystem!emory ' “s”

Training%et ' “t”

".  in67

1. "dministrator object to !ain%creen object

". !ain%creen object to %ystem!emory object

$. %ystem!emory object to !ain%creen object

43

Page 44: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 44/95

&. !ain%creen object to Training%et object

+. Training%et object to !ain%creen object

,. !ain%creen object to "dministrator object

$.  Me77age7

1. -ecognize documents

". %tore processed document

$. -ead file image

&. -ecognize using ocr 

+. %end processed document

B. -ecognize the characters

44

Page 45: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 45/95

a#Administrator m#/ainScreen s#System/emory t#TrainingSet

1#Recognise documents

2-Store processed document

7-Read file image

:-Recognise using ocr 

;-Send processed document

<-Recognise the characters

*igre 1$/Se?ence Diagra; 3or Recognition

Se?ence Diagra; 3or Doc;ent E4iting

1.  O@ect7

"dministrator ' “a”

!ain%creen ' “m”

(ocument ' “d”

%ystem!emory ' “s”

".  in67

?. "dministrator object to !ain%creen object.

@. !ain%creen object to (ocument object.

>. !ain%creen object to (ocument object

G. !ain%creen object to (ocument object

45

Page 46: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 46/95

E. (ocument object to %ystem!emory object.

B. %ystem!emory object to "dministrator object.

$.  Me77age7

?. $dit document

@. "dding document

>. "dds

G. (eleting document

E. (eletes

B. !odifying document

F. !odifies

. %tores the edited documents

H. "dministrator accesses the edited documents

46

Page 47: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 47/95

a#Administrator m#/ainScreen d-ocument s#System/emory

1-Edit document

2-Adding document

7-adds

:-eleting content;-eletes

=-/odifies

>-Stores the edited documents

?-Administrator accesses the edited documents

<-/odifing content

*igre 1&/Se?ence Diagra; 3or E4iting

Se?ence Diagra; 3or Doc;ent Searching

  O@ect7

"dministrator ' “a”

!ain%creen ' “m”

 (ocument J “d”

  in67

47

Page 48: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 48/95

1. "dministrator object to !ain%creen object

". !ain%creen object to (ocument object

$. (ocument object to "dministrator object

  Me77age7

1. %pecifies the word

". %earches the word

$. %earches

&. -eturns the location of the word

a#Administrator m#/ainScreen d#ocument

1Specifies the word

2Searches the word

:Returns the location of the word

7Searches

*igre 1+/Se?ence Diagra; 3or Searching

4

Page 49: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 49/95

&.".+ ACTI2ITY DIAGRAMS

The purpose of activity diagram is to provide a view of flows and what is going on inside a

use case or among several classes. "ctivity diagram can also be used to represent a class=s

method implementation. " token represents an operation. "n activity is shown as a round

 bo5 containing the name of the operation. "n outgoing solid arrow attached to the end of 

activity symbol indicates a transition triggered by the completion.

Acti<it8 Diagra; *or Doc;ent Proce77ing

4!

Page 50: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 50/95

Re9uest documentprocessing

Process

document

Retry for

scanning

Scan

documents

Store

documents

@ scanner not ready

@ scanner ready

*igre 1,/Acti<it8 Diagra; *or Proce77ing

5"

Page 51: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 51/95

Re9uestdocument

'nitiate search51

Page 52: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 52/95

*igre 1-/Acti<it8 Diagra; 3or 4oc;ent Retrie<al

Acti<it8 Diagra; *or Doc;ent Storage

Edit

documents

elete documentcontent

@ user choses delete  Add documentcontent

@ user choses add

/odifydocument

@ user choses modify

Storedocuments

*igre 10/Acti<it8 Diagra; 3or Doc;ent Storage

52

Page 53: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 53/95

&."., COMPONENT DIAGRAM

The crucial component in our component diagram that plays a major role in implementing

the )- system is the #*I component. "ll other components that is (ocument processing

and recognition, (ocument editing and (ocument %earching depends on it. They are as

follows8'

#*I )omponent that is used to design #*I screens for interacting with the end'user and

administrator.

rom the #*I component other components functionalities are carried out. The

functionalities include (ocument processing and recognition, (ocument editing and

(ocument %earching.

53

Page 54: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 54/95

CD' ocument Processingand Recognition

EditingSearching

CD' Screens

addingdeletingmodifying

scanningstoringand recognisingcharacters

supports usersearch function

 

*igre 1=/Co;ponent Diagra;

&.".- DEPOYMENT DIAGRAM

  " deployment diagram serves to model the physical deployment of artifacts on

deployment targets. (eployment diagrams show Dthe allocation of "rtifacts to +odes

according to the (eployments defined between them.”.

54

Page 55: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 55/95

In the deployment diagram of our )- system, the server role is played by admin called

/ibrarian. There can be + number of clients who can access the digital library data content

at a time. The clients here may be either the students or the faculty or the both.

The actions performed by the "dministrator are document processing, searching and

editing where as the actions performed by the end'user are only document searching

and editing.

 

!!Ser&er""

!!+lient1"" !!+lient2"" !!+lient3""

ocument

searching

editing

ocument

searching

editing

ocument

searching

editing

ocument

Processing

editing andsearching

*igre 10/Deplo8;ent Diagra;

55

Page 56: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 56/95

+.CODINGBCODE TEMPATES

Sa;ple Co4e

CODE SNIPPETS *OR TRAINING

 public class Training%et

K

  protected int input)ountL

  protected int output)ountL

  protected double inputMNMNL

  protected double outputMNMNL

  protected double classifyMNL

  protected int training%et)ountL

Training%et 1 int input)ount , int output)ount 2

  K

  this.input)ount O input)ountLs

  this.output)ount O output)ountL

  training%et)ount O PL

  Q

 public int getInput)ount12

  K

  return input)ountL

56

Page 57: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 57/95

  Q

  public int getutput)ount12

  K

  return output)ountL

  Q

  public void setTraining%et)ount1int training%et)ount2

  K

  this.training%et)ount O training%et)ountL

  input O new doubleMtraining%et)ountNMinput)ountNL

  output O new doubleMtraining%et)ountNMoutput)ountNL

  classify O new doubleMtraining%et)ountNL

  Q

 public int getTraining%et)ount12

  K

  return training%et)ountL

  Q

void setInput1int set,int inde5,double value2 throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  if 1 1inde5RP2 SS 1inde5Oinput)ount2 2

  throw1new -untime$5ception1DTraining input inde5 out of range8D U inde5 22L

  inputMsetNMinde5N O valueL

57

Page 58: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 58/95

  Q

  void setutput1int set,int inde5,double value2

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  if 1 1inde5RP2 SS 1setOoutput)ount2 2

  throw1new -untime$5ception1DTraining input inde5 out of range8D U inde5 22L

  outputMsetNMinde5N O valueL

  Q

  void set)lassify1int set,double value2

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  classifyMsetN O valueL

  Q

  double getInput1int set,int inde52

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  if 1 1inde5RP2 SS 1inde5Oinput)ount2 2

5

Page 59: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 59/95

  throw1new -untime$5ception1DTraining input inde5 out of range8D U inde5 22L

  return inputMsetNMinde5NL

  Q

  double getutput1int set,int inde52

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  if 1 1inde5RP2 SS 1setOoutput)ount2 2

  throw1new -untime$5ception1DTraining input inde5 out of range8D U inde5 22L

  return outputMsetNMinde5NL

  Q

  double get)lassify1int set2

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  return classifyMsetNL

  Q

  void )alculate)lass1int c2

  Kfor 1 int iOPLiROtraining%et)ountLiUU 2 K

  classifyMiN O c U P.?L

  Q Q

5!

Page 60: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 60/95

  double MNgetutput%et1int set2

  throws -untime$5ception

  Kif 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  return outputMsetNLQ

  double MNgetInput%et1int set2

  throws -untime$5ception

  K

  if 1 1setRP2 SS 1setOtraining%et)ount2 2

  throw1new -untime$5ception1DTraining set out of range8D U set 22L

  return inputMsetNL

  Q

Q

6"

Page 61: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 61/95

,.TESTING

The purpose of testing is to discover errors. Testing is the process of trying to discover 

every conceivable fault or weakness in a work product. It provides a way to check the

functionality of components, sub assemblies, assemblies and7or a finished product. It is the

 process of e5ercising software with the intent of ensuring that the software system meets its

re3uirements and user e5pectations and does not fail in an unacceptable manner. There are

various types of test. $ach test type addresses a specific testing re3uirement.

,.1 TYPES O* TESTS

Unit Te7ting

  *nit testing involves the design of test cases that validate that the internal program logic

is functioning properly, and that program input produces valid outputs. "ll decision

 branches and internal code flow should be validated. It is the testing of individual software

units of the application .it is done after the completion of an individual unit before

integration. This is a structural testing, that relies on knowledge of its construction and is

invasive. *nit tests perform basic tests at component level and test a specific business

 process, application, and7or system configuration. *nit tests ensure that each uni3ue path of 

a business process performs accurately to the documented specifications and contains

clearly defined inputs and e5pected results.

61

Page 62: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 62/95

Integration Te7ting

Integration tests are designed to test integrated software components to determine if they

actually run as one program. Testing is event driven and is more concerned with the basic

outcome of screens or fields. Integration tests demonstrate that although the components

were individually satisfaction, as shown by successfully unit testing, the combination of 

components is correct and consistent. Integration testing is specifically aimed at e5posing

the problems that arise from the combination of components.

S87te; Te7ting

  %ystem testing ensures that the entire integrated software system meets re3uirements. It

tests a configuration to ensure known and predictable results. "n e5ample of system testing

is the configuration oriented system integration test. %ystem testing is based on process

descriptions and flows, emphasizing pre'driven process links and integration points.

*nctional Te7ting

  unctional tests provide a systematic demonstration that functions tested are available as

specified by the business and technical re3uirements, system documentation, and user 

manuals.

unctional testing is centered on the following items8

9alid Input 8 identified classes of valid input must be accepted.

Invalid Input 8 identified classes of invalid input must be rejected.

unctions 8 identified functions must be e5ercised.

utput 8 identified classes of application outputs must be e5ercised.

%ystems7rocedures 8 interfacing systems or procedures must be invoked.

62

Page 63: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 63/95

rganization and preparation of functional tests is focused on re3uirements, key functions,

or special test cases. In addition, systematic coverage pertaining to identify business process

flows, data fields, predefined processes, and successive processes must be considered for 

testing. &efore functional testing is complete, additional tests are identified and the

effective value of current tests is determined.

There are two basic approaches of functional testing8

a. &lack bo5 or functional testing.

 b. :hite bo5 testing or structural testing.

a! (lac6 @o te7ting

This method is used when knowledge of the specified function that a product has been

design to perform is known. The concept of black bo5 is used to repress..ent a system hose

inside working=s are not available to inspection. In a black bo5 the test item is eaten as

“&lack”, since its logic is unknown is what goes in and what comes out, or the input and

output.

In @lac6 @o te7ting we try various inputs and e5amine the resulting outputs. The black 

 bo5 testing can also be used for scenarios based test .In this test we verify whether it is

taking valid input and producing resultant out to user. It is imaginary bo5 testing that hides

internal workings. In our project valid input is image resultant output well structured image

should be received.

@! 'hite @o te7ting

:hite bo5 testing is concern with testing implementation of the program. The intent of 

structural testing is not to e5ercise all the inputs or outputs but to e5ercise the different

 programming and data structures used in the program. Thus structure testing aims to

63

Page 64: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 64/95

achieve test cases that will force the desire coverage of different structures. Two types of 

 path testing are8

  ?. %tatement testing

@. &ranch testing

State;ent Te7ting

  The main idea of statement testing coverage is to test every statement in the objects

method by e5ecuting it at least once. 6owever, realistically, it is impossible to test program

on every single input, so you never can be sure that a program will not fail on some input.

(ranch Te7ting

  The main idea behind branch testing coverage is to perform enough tests to ensure that

every branch alternative has been e5ecuted at least once under some test. "s in statement

testing coverage, it is unfeasible to fully test any program of considerable size.

,." UNIT TESTING

*nit testing is usually conducted as part of a combined code and unit test phase of the

software lifecycle, although it is not uncommon for coding and unit testing to be conducted

as two distinct phases.

Te7t 7trateg8 an4 approach

ield testing will be performed manually and functional tests will be written in detail.

Te7t o@ecti<e7

  "ll field entries must work properly.

  ages must be activated from the identified link.

64

Page 65: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 65/95

  The entry screen, messages and responses must not be delayed.

*eatre7 to @e te7te4

  9erify that the entries are of the correct format.

   +o duplicate entries should be allowed.

  "ll links should take the user to the correct page.

,.$ INTEGRATION TESTING

%oftware integration testing is the incremental integration testing of two or more integrated

software components on a single platform to produce failures caused by interface defects.

The task of the integration test is to check that components or software applications, e5.

components in a software system or one step up software applications at the company level

' interact without error.

Te7t Re7lt7/ "ll the test cases mentioned above passed successfully. +o defects

encountered.

,.& ACCEPTANCE TESTING

*ser "cceptance Testing is a critical phase of any project and re3uires significant

 participation by the end user. It also ensures that the system meets the functional

re3uirements.

Te7t Re7lt7/ "ll the test cases mentioned above passed successfully. +o defects

encountered.

65

Page 66: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 66/95

OUTPUT SCREENS

The following shows the series of output screens and how the actual process of 

implementing )- takes place8'

The first and the home page of our optical character recognition system looks as shown

in figure .?.It provides an interface to the user such that the user can access any

module that is present in this software from this page itself. The page is as shown

 below8'

*igre 10/ Main 7creen

There are two types of recognitions in the document recognition module. They are

handwrtitten letter recognition and the scanned document recognition. The

implementation of the handwritten document recognition proceeds as follows8'

66

Page 67: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 67/95

  Proce77 Eplaining Han4 'ritten etter Recognition

irstly, :hen we click the handwritten recognition button on the home page the

following screen appears on the user interface presenting the user all the operations that

can be performed in this module8'

Han4 'ritten Screen 1

rom the above screen we can write letters on the workspace provided with the name

“(raw /etters 6ere” by using mouse pointer. or recognizing these letters we have to

train the system first. $lse, it will give an error message depicting that the system has to

 be trained first. This process is e5plained with the following screens8'

irstly suppose that we have drawn a letter named V"= in the workspace provided.

67

Page 68: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 68/95

Han4 5ritten Screen "

 +ow suppose that you have clicked the  “Recognize”   button without training, for 

recognizing the character you have written and showing the recognized character in the

grid. Then it will display an error message as shown below8'

6

Page 69: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 69/95

Han4 5ritten 7creen $

 +ow if we click the “Begin Training”  button before proceeding with the recognition

then a status message with successful status is shown below8'

Han4 5ritten 7creen &

6!

Page 70: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 70/95

%ince the training has been completed, now the letter V"= can be recognized by clicking

then “Recognize” button. Then the letter V"= will appear in the grid as output. It is as

shown below8'

Han4 5ritten 7creen +

nce we have provided training to the system for every session, the system do not need

any further training for any kind of letter in any kind of language. That is, once the

training is provided to the system for at'least one character then onwards, it will

recognize any character written in the workspace without the need of training it.

or $5ample, irst we have written letter V"= provided training for it and recognized the

letter ". +ow we have written letter %. +ow without the need for the training we can

directly recognize the letter V%= in the grid by clicking the “Recognize”  button. Thus we

do not need to train the system further.

7"

Page 71: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 71/95

Han4 5ritten 7creen ,

Han4 'ritten Screen -

%ince we have provided the training to the system once with one character of $nglish

language, :e can now recognize the characters of any language other than $nglish that

71

Page 72: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 72/95

too without the need for training. %uppose we have written a telugu character as shown

 below8'

Han4 5ritten 7creen 0

 +ow we can directly recognize the above telugu character without the need of training

the system. Cust click the “Recognize”  button once after drawing the letter in the

workspace.

72

Page 73: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 73/95

Han4 5ritten 7creen =

 +e5t other than providing the training to the system through the drawn letters, we can

also train the system by providing the characters through the keyboard and storing them

as patterns. /ater we provide training to the system on those patterns.

irstly, :e provide the input through the keyboard as follows8'

Han4 5ritten 7creen 1>

73

Page 74: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 74/95

If we click ok, those letters will be saved in stored patterns workspace. /ater we can

click “Begin Training”  button such that those stored patterns will be trained to the

system. $lse, it will provide an error message depicting that the system needs training.

Han4 'ritten Screen 11

 +ow suppose that if we write a word Vsr= and click “Recognize”   button before

 providing training on the above stored pattern V"= then an error message will be

displayed depicting that the system needs to be trained on the stored patterns as shown

 below8'

74

Page 75: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 75/95

Han4 5ritten 7creen 1"

 +ow click the “&egin Training” button before you attempt to recognize the drawn word

Vsr=. Then it produces an output screen as shown below indicating that the training has

 been completed8'

Han4 'ritten Screen 1$

 +ow if we click the “Recognize”  button then the drawn word Vsr= is recognized and is

shown as an output in the grid format by firing the last neuron in stored patterns.

75

Page 76: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 76/95

Han4 5ritten 7creen 1&

%ince we have provided training on the stored patterns once, from now onwards we can

 just draw the characters or words of any language and we can recognize them directly

 by clicking the ”Recognize”  button without the need for training the system again. "n

e5ample is shown for a telugu word.

76

Page 77: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 77/95

Han4 5ritten Screen 1+

  Proce77 Eplaining Scanne4 Doc;ent Recognition

irstly, :hen we click the “%canned (ocument -ecognition &utton” the main page of 

this recognition module is displayed as follows8'

Scanne4 Screen 1

77

Page 78: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 78/95

The data that is present in the first te5t bo5 is the default image file set by the user. The

user can change the input image file rather than the default image file by clicking open

and then selecting an image file. The procedure is as shown below8'

Scanne4 Recognition Screen "

Scanne4 Recognition Screen$

7

Page 79: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 79/95

There are two main tabs under the scanned document recognition. They are training 

and recognition. irst we should train the system under training module. nly then we

can recognize the characters from the input image provided using the recognition

module. The training tab under scanned document recognition looks like this.

Scanne4 Recognition Screen&

The above figure shows the default input image for training. :e can change the training

input image for different fonts by opening different input image files and then training

them such that the system gets adapted to the new fonts.

7!

Page 80: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 80/95

Scanne4 Recognition Screen +

The choice of opening image file changes the default input image for training in to a

new image as shown below.

Scanne4 Recognition Screen ,

"

Page 81: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 81/95

 +ow the user can select the bounds up to which the system must be trained just by using

click and drag actions of the mouse. Then selected data highlights as follows8'

Scanne4 Recognition Screen-

"fter selection of the data, just click the “Train”  button. This lets the system to train

itself with the help of the kohonen network and finally displays a dialog bo5 depicting

that the training has been completed successfully.

1

Page 82: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 82/95

Scanne4 Recognition Screen0

nce the training of the system is completed, we move on to the recognition phase

where we open a new scanned image file to be converted into editable document as an

input as per our re3uirement. +ow we select that part of the image from which the data

has to be e5tracted. Then it looks like8'

Scanne4 Recognition Screen=

2

Page 83: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 83/95

 +e5t click the “Cro”  button such that it finds the bounds of the te5t that is selected by

the user by composing a red boundary line around the selected te5t. It is as shown

 below8'

Scanne4 recognition Screen 1>

inally click the “Recognize”  button such that it e5tracts7recognizes the characters from

the image and presents it to the user. &ut this data is still not editable. 6ence when we

click on the “$(IT” button provided at the bottom'center then the document becomes

 both editable and searchable. This complete process is e5plained in the upcoming two

screens. It is as shown below8'

3

Page 84: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 84/95

Scanne4 Recognition Screen11

Scanne4 Recognition Screen 1"

4

Page 85: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 85/95

 +ow from the data available in the above screen shot, we can make any sort of changes

to the document using cut, copy, paste and etc and 0ou can finally save the document in

two formats1word, te5t2 as per our design.

The search function can be carried out here by clicking the “find” image button at the

 bottom'left corner. Then it asks the user to enter the search term. It is as shown below8'

Scanne4 Recognition Screen 1$

 +ow in the above screen shots dialog bo5, if you click k then there are two cases that

happens over here as per our design. They are8'

)ase'?8' If the user enter search term resides in the document, then it will display a

dialog bo5 asking the user, “whether he wants to continue the search or not< “.

If the user clicks yes then it will move the cursor to the search term.

If the user clicks no then it will e5it the search.

5

Page 86: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 86/95

)ase'@8'If the user enters a search term that does not reside in the document, then it will

direct display a dialog bo5 saying that the searching is finished. It means that the search

term is not present in the document.

Thus the user can understand whether the search term is present in the document or 

not just after entering the search term itself.

If we are searching for a term that is already present in the document then the series of 

output screens will be as follows8'

Scanne4 Recognition Screen 1&

6

Page 87: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 87/95

Scanne4 Recognition Screen 1+

7

Page 88: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 88/95

If we are searching for a term that does not reside in the document then the series of 

output screens will be as follows8'

Scanne4 Recognition Screen 1,

Page 89: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 89/95

Scanne4 Recognition Screen 1-

If we are using the editor, you can perform the following actions displayed in the

screens below8'

Scanne4 Recognition Screen 10

!

Page 90: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 90/95

-.  PLATFORM/TOOLS USED

SO*T'ARE RE9UIREMENTS SPECI*ICATION

• Operating S87te;   8 :indows'W

• Progra;;ing angage  8 )ore Cava

• U7er Inter3ace  8 %wings

HARD'ARE RE9UIREMENTS SPECI*ICATION

• Proce77or / entium I9 processor or higher 

• RAM / !inimum of E?@ !& -"!

• Me;or8 / EPP !& or higher 

!"

Page 91: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 91/95

0. CONCUSION AND *UTURE SCOPE

:hat does the future hold for )-< #iven enough entrepreneurial designers and sufficient

research and development dollars, )- can become a powerful tool for future data entry

applications. 6owever, the limited availability of funds in a capital'short environment could

restrict the growth of this technology. &ut, given the proper impetus and encouragement, a

lot of benefits can be provided by the )- system. They are8'

The automated entry of data by )- is one of the most attractive, labor reducing

technology

The recognition of new font characters by the system is very easy and 3uick.

:e can edit the information of the documents more conveniently and we can reuse the

edited information as and when re3uired. The e5tension to software other than editing and searching is topic for future works.

  The #rid infrastructure used in the implementation of ptical )haracter -ecognition

system can be efficiently used to speed up the translation of image based documents into

structured documents that are currently easy to discover, search and process.

*UTURE ENHANCEMENTS

The ptical )haracter -ecognition software can be enhanced in the future in different kinds

of ways such as8

Training and recognition speeds can be increased greater and greater by making it

more user'friendly.

!1

Page 92: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 92/95

!any applications e5ist where it would be desirable to read handwritten entries. -eading

handwriting is a very difficult task considering the diversities that e5ist in ordinary

 penmanship. 6owever, progress is being made

!2

Page 93: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 93/95

=. RE*ERENCES

*nder this references section, we have mentioned various references from which we

collected our problem and several others that supported us to design the solution for our 

 problem. These references include either books, papers published through some standards

and several websites links with *-/=s8'

or the complete reference and understanding of neural networks refer jeff heaton=s

chapter ? from www.jeffheaton.com

or the complete reference and understanding of )- refer jeff heaton=s chapter F

from www.jeffheaton.com

The I$$$ standard reference paper from which we collected our problem statement

is authorized by (ana etcu, %ilviu anica, 9iorel +egru and "ndrei $ckstein of 

)omputer %cience (epartment who are from :est *niversity of Timisoara,

-omania.

The reference paper is also authorized by (oina &anciu from +ational Institute for 

-esearch and (evelopment in Informatics, -omania.

0ou can refer the I$$$ standard paper written by (. "ndrews, -. &rown, ).

)aldwell, et al., “" arallel "rchitecture for erforming -eal Time !ulti'/ine

ptical )haracter -ecognition”

0ou can refer the I$$$ standard paper written by 6. #oto, “)-#rid 8 " latform

for (istributed and )ooperative )- %ystems”

!3

Page 94: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 94/95

1>. APPENDIX

Appen4i A/ Glo77ar8

TERMS

  "ll the terms and abbreviations in the project are specified clearly. or further

development of project evolved definitions will be specified

ACRONYMS

  I$$$8 Institute of $lectrical and $lectronics $ngineers

  ((8 (ata low (iagram

  *!/8 *nified !odeling /anguage

  C@$$8  Cava @ $nterprise $dition

#*I8  #raphical *ser Interface

)-8 ptical )haracter -ecognition

  #)-8 #rid )- 

Appen4i (/ Anal87i7 Mo4e

This includes all the pertinent analysis models, such as data flow diagrams, class diagrams,

use case diagrams, interaction diagrams and state'chart diagrams.

!4

Page 95: Projectreport Ocrrecognition 140903052518 Phpapp02

7/23/2019 Projectreport Ocrrecognition 140903052518 Phpapp02

http://slidepdf.com/reader/full/projectreport-ocrrecognition-140903052518-phpapp02 95/95