final a two stage character segmentation technique

TWO STAGE CHARACTER SEGMENTATION FOR PRINTED TELUGU TEXT

Under the guidance of

M.Sirisha (Asst.prof)

S.Padmavathi(07H71A0431) K.Gafoor raja(08H75A0403)

MD.Jasmin(07H71A0423) J.Suresh(07H71A0459)

T.Sekhar(07H71A0450)

Introduction:Optical character recognition (OCR) deals with the processing

of optically processed characters.

Character recognition provides a solution for processing large volumes of data automatically in a large variety of scientific and business applications.

Not much work has been reported on the development of Optical Character Recognition (OCR) systems for Telugu text. Therefore, it is an area of current research.

A compound character may contain one or more connected symbols.

Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands.

Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language.

Block Diagram

Pre processing

Text document

Line Segmentation

Word Segmentation

Character Segmentation

User Input

Segmentation:Uses the classical approach in which the scanned

image is dissected into individual building blocks to be recognized as characters.

It is one of the decision stages in OCR system because incorrectly segmented characters will not be recognized properly.

So, recognition rate will be reduced.

The two stages involved in segmentation are:

1)Only the suffixes are segmented from the word using connected component processing.

2)Remaining characters from the word are easily segmented using the traditional vertical projection profile.

• The major strength of proposed two stage method is it works faster than classical single stage method of segmenting characters using connected component analysis only.

Segmentation Methodology:

This method starts by segmenting the lines from the scanned document by using Horizontal Projection Profile.

The words are segmented by using Vertical Projection Profile.

If the subscript characters are present in the word they are extracted using Connected Component method.

If the subscript characters are not present the main characters are segmented using Vertical Projection Profile.

Types of segmentation required:

(1)Line Segmentation:

White spaces between the text lines is used to segment the lines.

To separate the text lines the horizontal projection profile of the text document image is found.

The Horizontal projection profile is the histogram of number of ON pixels along every row of the image.

Line segmentation

Word Segmentation: Spacing between the words is used for word

segmentation since spacing between the words is greater than spacing between the characters.

• The Spacing between the words is found by taking the vertical projection profile (VPP) of an input text line.

• Vertical projection profile is the sum of ON pixels along every column of the image .

Word Segmentation:

(3)Character Segmentation:

Spacing between the characters can be used for segmentation.

For character segmentation also VPP is used. But, some

times in the Vertical Projection Profile of the word there will not be any zero-valued valleys due to the presence of subscript characters.

1) A word without subscripts:

2) A word with subscripts:

Fig 2. Figure showing the word whose subscripts are removed.

Fig 1. Figure showing a word with subscripts and the threshold level.

RESULTSInput Image:

Fig. 1: Input Image for Line Segmentation

Line Segmentation:

Fig. 2: First Line After Line Segmentation

Fig 3: Second Line After Line Segmentation

Fig. 4: Third Line After Line Segmentation

Fig. 5: Input Image For Word Segmentation

Word Segmentation:

Fig. 6: First Word After Word Segmentation

Fig. 7: Second Word After Word Segmentation

Fig. 8: Third Word After Word Segmentation

Fig. 9: Fourth Word After Word Segmentation

Fig. 10: Fifth Word After Word Segmentation

Character segmentation:

Fig 1: Character 1

Fig 2: Character 2

Fig 3: Character 3

Fig 4: Character 4

Fig 5: Character 5

Fig 6: Character 6

Document matching system:

The given document is matched with the pure document which is in database. If both are same then returns as exact match otherwise returns as duplicate.

• Document speaking system

• Document Database System

• Full-text Search

• Processing Documents with Signatures, Company Stamps

• Re-creation of Document Logical Structure and Formatting

•Retention of Fonts and Font Styles

References:References:• http://ieee.org/

• http://portal.acm.org/citation.cfm?id=231611

• tcts.fpms.ac.be/publications/papers/2004/isspit04_cmtbg.pdf

• [1] T. Bayer U. Kressel and M. Hammelsbeck, "Segmenting Merged

Characters," <i>Proc. 11th Int'l Conf. Pattern Recognition,</i> vol. 2.

conf. B: Pattern Recognition, Methodology, and Systems, pp. 346-349,

• [2]. S. Bercu and G. Lorette, "On-line Handwritten Word Recognition: An

Approach Based on Hidden Markov Models," <i>Pre-Proc. IWFHR III,</i>

Buffalo, N.Y., p. 385, May 1993.

• [3]. D. G. Elliman , I. T. Lancaster, A review of segmentation and contextual

analysis techniques for text recognition, Pattern Recognition, v.23 n.3-4,

p.337-346, March 1990 [doi>10.1016/0031-3203(90)90021-C]

final a two stage character segmentation technique

word segmentation fi

line segmentation fig

stage character segmentation

segmentation methodology

review of segmentation

types of segmentation

y character recognition

y compound characters

Documents

document image segmentation using k-means clustering · pdf...

a segmentation layout guarding technique to mitigate ... ·...

oversegmentation methods for character segmentation in off

an enhanced technique for segmentation and encrypt data to...

segmentation of touching character printed...

novel segmentation method for optical braille character...

chapter 5 effective segmentation technique for personal...

segmentation technique of sar imagery using entropy

esmail hadi houssein id/2700213044. motivation problem...

vol.2, special issue 5, october 2014 tumor detection and...

impact final conference - ncsr - character segmentation

character segmentation and skew correction for handwritten

interactive shortest path part 3 an image segmentation...

face modeling using segmentation technique

march 2017 market segmentation - mv-research.com · market...

munich automatic segmentation (maus) · munich automatic...

best combination of binarization methods for license plate...

image segmentation - computer...

development of a feature extraction technique for online...

comparison of visual and logical character segmentation in...