bijay dahal {2008/bct/509} kabindra shrestha {2008/bct/516} raj kumar shrestha {2008/bct/527}

Post on 31-Mar-2015

235 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

OPTICAL CHARACTER RECOGNITION TOOLBijay Dahal {2008/BCT/509}Kabindra Shrestha {2008/BCT/516}Raj Kumar Shrestha {2008/BCT/527}

OBJECTIVES

To convert alpha-numeric character from image into normal text form.

To get general idea on image processing.

TOOLS/TECHNOLOGY USEDS.N

Tools Description

1 JDK 6 Development Kit for JAVA

Programming2 NetBeans 7.0 IDE for JAVA Application Development3 Microsoft Windows &

Linux OS platforms to Application

4 Tortoise SVN Version Control Software for Project

Mgmt.5 Sourceforge Project Management and

Configuration6 Microsoft Office Documentations

OVERVIEW

Taking image as input . Converts into normal text form. Recognizes alpha-numeric

characters only. Edit and Save recognized text.

Loaded Image

Converted Text Editable

SYSTEM ARCHITECTURE

Save Text

Matrix Matching

Feature Extraction

Character Segment

Line Segment

Thinning

Binarization

Get ImageBold Thin

METHODOLOGY/ALGORITHMS Otsu Binarization Algorithm

Hilditch Skeletonization Algorithm (Thinning)

ALGORITHMS (CONTD…)

Generic Segmentation

(CONTD…)

Feature Extraction (zonning)

Based on Zones• 5 horizontal and 5 vertical zones =>25 features

Based on Upper and Lower profiles• 10 vertical zones => 20 features

Based on Left and Right profiles• 10 horizontal zones => 20 features

Total Number of features• 25 + 20 + 20 = 65

SCHEDULE

ID Task Name Start Finish DurationAug 2011Jul 2011 Sep 2011

7/3 8/7

1 14d7/3/20116/20/2011System Analysis

2 14d7/17/20117/4/2011System Design

3 62d9/17/20117/18/2011Coding

4 10d9/27/20119/18/2011Testing

5 14d10/11/20119/28/2011Debugging

6 4d10/15/201110/12/2011Efficiency & Performance Testing

7 150d11/16/20116/20/2011Documentation

Oct 2011

9/4

OFF DAYS:Exam Time: (25 Days)Dashain Holidays: (15 Days)Tihar Holidays: (3 Days)

CHALLENGES/PROBLEM FACED• Choosing the correct algorithm.• Hard to implement algorithm.• Implemented, but output is not

accurate.• accuracy of matrix matching.

CONCLUSION

Text from image gets converted to text file.

Simplest algorithm; accuracy is about 40%-60%.

LIMITATION

Can’t recognize text in noisy image.

Can’t detect inclined text from image.

Matrix matching is slow. Bad thinning & noise makes some

text unrecognizable.

FUTURE ENHANCEMENT

Scanner image input. Recognize PDF and other image

format. Nepali / Devnagari font support. Different fonts. Output in PDF or Word file format. Skewing & Noise reduction. Handwritings. Neural Network.

REFERENCES Bates, K. S. (2010). Head First Java. O'Reilly. Improving Optical Character Recognition http

://www.csc.villanova.edu/~mdamian/csc3990/csrs2008/07-csrs2008-AJPalkovic.PDF

Evaluation of OCR Algorithms for Images: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.9539&rep=rep1&type=PDF

Otsu Thresholding - The Lab Book Pages http://www.labbookpages.co.uk/software/imgProc/otsuThreshold.html

Image Segmentation http://people.cs.uchicago.edu/~pff/segment/

Hilditch Algorithm http://cis.k.hosei.ac.jp/~wakahara/Hilditch.c

Skeletonization http://cgm.cs.mcgill.ca/~godfried/teaching/projects97/azar/skeleton.html

Java OCR | Ron Cemer's Blog http://www.roncemer.com/software-development/java-ocr

THANK YOU …

top related