# online arabic handwriting recognition

Post on 24-Feb-2016

49 views

Embed Size (px)

DESCRIPTION

Online Arabic Handwriting Recognition. By George Kour Supervised by Dr. Raid Saabne. Machine Learning (Optional). Main model (PAC). Pattern Recognition(Optional). Supervised learning vs. unsupervised learning Classification techniques Binary classification vs. multiclass classification - PowerPoint PPT PresentationTRANSCRIPT

Just In Time Arabic Handwriting Recognition/Segmentation

Online Arabic Handwriting RecognitionBy George KourSupervised by Dr. Raid SaabneMachine Learning (Optional)Main model (PAC)Pattern Recognition(Optional)Supervised learning vs. unsupervised learningClassification techniquesBinary classification vs. multiclass classificationNave BayesNeural NetworkTreeClusteringSupervised techniquesSVMK- meansBackgroundFeatureMetricsDimensionality ReductionClassificationThe Arabic LettersArabic is the Mother tongue of more than 350 Million people.Other languages that use the Arabic letters is parsian ...How many manuscripts arte written in ArabicArabic is a cursive languageIt is composed by word parts.Show samples of Arabic script.Support Vector MachinesGiven Training sample data of the form:

Find the maximum margin hyperplabe that divides samples of the two classes.The hyperplane formula:

If the samples are linearly separable, there may be infinite hyperplanes separating the samples of the two classes. Which is the best?

denotes +1denotes -1x1x2

Support Vector Machinesx1x2denotes +1denotes -1MarginwT x + b = 0wT x + b = -1wT x + b = 1x+x+x-nSupport Vectors

Non Linear SVMDatasets that are linearly separable with noise work out great:0x0xx20xBut what are we going to do if the dataset is just too hard? How about mapping data to a higher-dimensional space:Nonlinear SVMs: The Kernel TrickWith this mapping, our discriminant function is now:

No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test.A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space that satisfies the Mercers Condition:

Nonlinear SVMs: The Kernel Trick2-dimensional vectors x=[x1 x2];

let K(xi,xj)=(1 + xiTxj)2, Need to show that K(xi,xj) = (xi) T(xj): K(xi,xj)=(1 + xiTxj)2, = 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2 = [1 xi12 2 xi1xi2 xi22 2xi1 2xi2]T [1 xj12 2 xj1xj2 xj22 2xj1 2xj2] = (xi) T(xj), where (x) = [1 x12 2 x1x2 x22 2x1 2x2]

An example:This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt Nonlinear SVMs: The Kernel TrickLinear kernel:

Examples of commonly-used kernel functions:Polynomial kernel:Gaussian (Radial-Basis Function (RBF) ) kernel:Sigmoid:In general, functions that satisfy Mercers condition can be kernel functions.Sequence Metric - DTWMeasuring sequences differencesThe IdeaImplementationExamplesFast and restricted DTWDoes not comply to the triangle inequality.Complexity analysisSequence Metric - EMDThe same analysis as DTWThe embedding.FeatureSequenceShape ContextMADSamples Collection and StoringOnline User Input systemEach User draws all the letters in all possible position (Ini, Mid, Fin, Iso).Letter Sequences are saved as .m files in the File SystemFile System StructureLetters SamplesAIsoSample1 (.m file)Sample2 (.m file)

FinSample1 (.m file)Sample2 (.m file)BIniSample1 (.m file)Sample2 (.m file)MidFinIsoSamples Collection and Storing (Cont.)From ADAB Database.ADAB contains sequences of online data of Tunisian cities. We build a system that segments the words in ADAB to output letters samples.

Word Parts GenerationWord Part is Arabic Sub word that are written in a single strokeWe built a system that generates sequences of all possible Arabic Word Parts.The Word parts are generated using Online Arabic Recognition

Online SegmentationChoosing candidates points in the writing process and then selecting the right combinations of demarcation points using dynamic programming.How to select the candidate points:SVMThere could be several segmentation options.Then select for each segmentation the candidate letters and then holistically select the word part.Important properties:Min Over SegmentationNo Under Segmentation(*) Complex LettersImprovements:How to use simplification to better perform the segmentation points?Online Segmentation IntroductionDefinitions:Candidate pointCritical pointSegmentation pointLearning TechniqueFeaturesSlopeForward directionClassification techniqueFind points that are classified Online Segmentation

Letter Samples ProcessingNormalizationLine Simplification Using Recursive Douglas-Peucker Polyline SimplificationResampling

Feature ExtractionEmbeddingDimensionality Reduction

Recommended