optical character recognition for bangla handwritten text

Post on 01-Nov-2014

1.293 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

OPTICAL CHARACTER RECOGNITION FOR

BANGLA HANDWRITTEN TEXT

INTRODUCTION Optical character recognition,(OCR), is

the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text.

It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website.

OCR makes it possible to edit the text, search for a word or phrase, store it more compactly.

Today we have OCRs easily available for the English language . We can find OCRs for printed Bengali as well but OCRs for handwritten Bengali are very rare. And those which are available do not have a decent recognition accuracy.

We aim to create such an OCR which gives us a considerable recognition accuracy for handwritten Bengali.

PROBLEM

Now we are creating an OCR for handwritten Bengali text. The main problem arises due to the fact that we are doing it for handwritten text. So our sample set is very infinite. Also different samples have different characteristics. The handwriting samples are collected from different persons, hence it is very unlikely that they will follow a similar pattern.

OUR APPROACH

We have followed a bottom up method in our approach, i.e. we start with a specific sample , and then approach towards the general solution. We take a particular sample , apply our methodology to it and find out the results. Then we re-perform the computation on a second sample set and depending upon the performance of our methodology on this set we keep on improving our process until it alludes towards a general solution.

Presently we are in our first step of the method : SEGMENTATION.

And the methods which we have used are :

Thinning and Run length ReductionProjection Along Column Scan lines

Thinning and Run length Reduction

Thinning basically is reducing the density of the characters ……….

But we faced some difficulties in this approach :This method was becoming too much

dependent on the handwriting which is not desirable.

The segmentation of the ‘matras’ and the character resulted in some gaps in the character itself which was not easy to fill in.

The segmentation obtained was not optimum.

PROJECTION

In this method we project the intensity of each and every character (here the ‘matras’ of the character are also taken into account along with the character). And for every straight line we get a peak value through which we can identify the presence of the ‘matra’.

This method too has its own disadvantages which can be summed up as follows :

As we are working with handwritten bengali text , it is not definite that we will have straight lines in the characters, i.e. if someone writes in italics then we will have bent lines, and the process will not identify that as a straight line which it should. Thus this method fails for such cases.

Work in the upcoming days

Due to the above demerits in the pre discussed methods we are now thinking of a new method . First of all standardization of the characters is to be done ..so this will give us a standard sample set which will probably overcome all the disadvantages previously mentioned.

THANK YOU !!!!!

top related