cpms long iwlc 06

19
A Computational Phonetic Model for Indian Language Scripts Anil Kumar Singh Language Tech nologies Research Centre IIIT , Hyderabad, India a[email protected] Abstract In spite of South Asia being one of the richest areas in terms of linguistic di- vers ity , South Asian langu ages hav e a lot in common. For examp le, most of the major Indian languages use scripts which are derived from the ancient Brahmi script, have more or less the same arrangement of alphabet, are highly phonetic in nature and are very well orga nise d. We have used this fact to build a com- putat ional phonet ic model of Brah mi origi n scrip ts. The phonetic model mainly consists of a model of phonology (including some orthographic features) based on a common alphabet of these scripts, numerical values assigned to these features, a stepped distance function (SDF), and an algorithm for aligning strings of feature vectors. The SDF is used to calculate the phonetic and orthographic similarity of two letters. The model can be used for applications like spell checking, predicting spelling/dialectal variation, text normalization, nding rhyming words, and iden- tifying cognate words across languages. Some initial experiments have been done on this and the results seem encouraging. 1 Intr oducti on Most of the major Indian languages (Indo-Aryan and Dravidian) use scripts which have ev olved from the ancient Bra hmi script [24, 25]. The re is a lot of similar ity among these scripts (even though letter shapes differ). The letters have a close correspondence with the sounds. The arrange ment of letters in the alphab et is simila r and based upon phonetic features. If you list the letters on a paper, you can draw rectangles consisting of letters representing phonemes with specic phonetic features (voiced-unvoiced, etc). This well-organised phonetic nature makes it possible to build a computational phonetic model of these scripts. We pre sent her e such a mode l, which was nee ded urge ntl y , especi ally because Indian langua ges hav e spelli ng vari ations even for common words. The scripts that we have covered are: Dev anagar i (Hindi, Marat hi, Nepali, Sanskrit), Bengali (Bengali and Assamese), Gurmukhi (Punjabi), Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. Bra hmi ori gin scr ipt s (al so cal led Indic scr ipts ) hav e bee n cla ssi ed va riously . Some of the terms used to class ify these scripts are: syllaba ry , alphas yllaba ry and abugi da. Out of these, abugida is perhaps the best term as it takes into account the property of these scripts which allows syllables to be formed systematically by combining conso- nants with vowel signs or maatraas, rather than having unique symbols for syllables

Upload: sunil-b-s-rao

Post on 07-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 1/19

Page 2: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 2/19

Page 3: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 3/19

Page 4: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 4/19

Page 5: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 5/19

Page 6: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 6/19

Page 7: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 7/19

Page 8: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 8/19

Page 9: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 9/19

Page 10: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 10/19

Page 11: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 11/19

Page 12: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 12/19

Page 13: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 13/19

Page 14: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 14/19

Page 15: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 15/19

Page 16: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 16/19

Page 17: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 17/19

Page 18: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 18/19

Page 19: Cpms Long Iwlc 06

8/6/2019 Cpms Long Iwlc 06

http://slidepdf.com/reader/full/cpms-long-iwlc-06 19/19