proposal for a malayalam script root zone label generation ...€¦ · malayalam was first written...

39
1 Proposal for a Malayalam Script Root Zone Label Generation Ruleset (LGR) LGR Version: 4.0 Date: 2020-05-07 Document version: 2.4 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information The purpose of this document is to give an overview of the proposed Malayalam LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used, the repertoire of code points included, variant code point(s), whole label evaluation rules and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-malayalam-lgr-07may20-en.xml. Labels for testing can be found in the accompanying text document: malayalam-test-labels-07may20-en.txt This LGR proposal was originally published on April 22, 2019. It has been updated to correct an inconsistency involving the support for conjunct “nta” and to address new cross-script variants for LGR-4. 2. Script for Which the LGR Is Proposed ISO 15924 Code: Mlym ISO 15924 Key N°: 347 ISO 15924 English Name: Malayalam Latin transliteration of native script name: malayāḷaṁ Native name of the script: മലയാളം Maximal Starting Repertoire (MSR) version: MSR-4 3. Background on Script and Principal Languages Using It Malayalam is a Dravidian language with about 38 million speakers spoken mainly in the south west of India, particularly in Kerala, the Lakshadweep Islands and neighbouring states, and also in Bahrain, Fiji, Israel, Malaysia, Qatar, Singapore, UAE and the UK. Malayalam was first written with the Vatteluttu alphabet (വെ)ഴ, Vaṭṭeḻuttŭ), which means 'round writing' and developed from the Brahmi script. The oldest known written text in Malayalam is known as the Vazhappalli or Vazhappally inscription, is in the Vatteluttu alphabet and dates from about 830 AD. A version of the Grantha alphabet originally used in the Chola kingdom was brought to the southwest of India in the 8th or 9th century and was adapted to write the Malayalam and Tulu languages. By the early 13th century it is thought that a systematized Malayalam alphabet had

Upload: others

Post on 20-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    ProposalforaMalayalamScriptRootZoneLabelGenerationRuleset(LGR)LGRVersion:4.0Date:2020-05-07Documentversion:2.4Authors:Neo-BrahmiGenerationPanel[NBGP]

    1. GeneralInformationThepurposeofthisdocumentistogiveanoverviewoftheproposedMalayalamLGRintheXMLformatandtherationalebehindthedesigndecisionstaken.Itincludesadiscussionofrelevantfeaturesofthescript,thecommunitiesorlanguagesusingit,theprocessandmethodologyused,therepertoireofcodepoints included,variantcodepoint(s),whole labelevaluationrulesandinformation on the contributors. The formal specification of the LGR can be found in theaccompanyingXMLdocument:proposal-malayalam-lgr-07may20-en.xml.Labelsfortestingcanbefoundintheaccompanyingtextdocument:malayalam-test-labels-07may20-en.txtThisLGRproposalwasoriginallypublishedonApril22,2019.Ithasbeenupdatedtocorrectaninconsistencyinvolvingthesupportforconjunct“nta”andtoaddressnewcross-scriptvariantsforLGR-4.

    2. ScriptforWhichtheLGRIsProposedISO15924Code:MlymISO15924KeyN°:347ISO15924EnglishName:MalayalamLatintransliterationofnativescriptname:malayāḷaṁNativenameofthescript:മലയാളംMaximalStartingRepertoire(MSR)version:MSR-4

    3. BackgroundonScriptandPrincipalLanguagesUsingItMalayalamisaDravidianlanguagewithabout38millionspeakersspokenmainlyinthesouthwestofIndia,particularlyinKerala,theLakshadweepIslandsandneighbouringstates,andalsoinBahrain,Fiji,Israel,Malaysia,Qatar,Singapore,UAEandtheUK.

    Malayalam was first written with the Vatteluttu alphabet (വെ)ഴു,് Vaṭṭeḻuttŭ), whichmeans'roundwriting'anddevelopedfromtheBrahmiscript.TheoldestknownwrittentextinMalayalamisknownastheVazhappalliorVazhappallyinscription,isintheVatteluttualphabetanddatesfromabout830AD.

    A version of the Grantha alphabet originally used in the Chola kingdomwas brought to thesouthwestofIndiainthe8thor9thcenturyandwasadaptedtowritetheMalayalamandTululanguages.Bytheearly13thcenturyitisthoughtthatasystematizedMalayalamalphabethad

  • 2

    emerged. Some changesweremade to the alphabet over the following centuries, and by themiddleofthe19thcenturytheMalayalamalphabethadattaineditscurrentform.

    AsaresultofthedifficultiesofprintingMalayalam,asimplifiedorreformedversionofthescriptwasintroducedduringthe1970sand1980s.Themainchangeinvolvedwritingconsonantsanddiacritics separately rather than as complex characters. These changes are not appliedconsistentlysothemodernscriptisoftenamixtureoftraditionalandsimplifiedletters.

    Thescripthasthefollowingnotablefeatures:

    ● Malayalamscript iswritten left toright inhorizontal linesusingasyllabicalphabet inwhichallconsonantshaveaninherentvowel.Diacritics,whichcanappearabove,below,beforeorafteraconsonant,areusedtochangetheinherentvowel.

    ● When they appear at the beginning of a syllable, vowels are written as independentletters.

    ● ChillaksharamisanotherfeatureofMalayalam.Achilluisapureconsonantwithouttheuseofavirama,whichkillstheinherentvowelofaconsonant.

    ● When certain consonants occur together, special conjunct symbols are used whichcombinetheessentialpartsofeachletter.

    3.1 TheEvolutionofMalayalamScriptMalayalamwasfirstwrittenintheVatteluttualphabet,anancientscriptofTamil.However,themodern Malayalam script evolved from the Grantha alphabet, which was originally used towriteSanskrit.BothVatteluttuandGranthaevolvedfromtheBrahmiscript,butindependently.

    3.2 VatteluttualphabetVatteluttu (Malayalam:വെ)ഴു,്, Vaṭṭeḻuttŭ, “roundwriting”) is a script that had evolvedfromTamil-Brahmiandwasonceusedextensively in the southernpartofpresent-dayTamilNaduandinKerala.

    MalayalamwasfirstwritteninVatteluttu.TheVazhappallyinscriptionissuedbyRajashekharaVarman is the earliest example, dating fromabout830CE. In theTamil country, themodernTamil script had supplanted Vatteluttu by the 15th century, but in the Malabar region,Vattelutturemainedingeneraluseuptothe17thcentury,orthe18thcentury.Avariantformofthisscript,Kolezhuthu,wasuseduntilaboutthe19thcenturymainlyintheKochiareaandinthe Malabar area. Another variant form, Malayanma, was used in the south ofThiruvananthapuram.

    3.3 Grantha,TigalariandMalayalamscriptsAccordingtoArthurCokeBurnell,oneformoftheGranthaalphabet,originallyusedintheCholadynasty,was imported into thesouthwestcoastof India in the8thor9thcentury,whichwasthenmodifiedincourseoftimeinthissecludedarea,wherecommunicationwiththeeastcoastwas very limited. It later evolved into the Tigalari-Malayalam script used by the Malayali,HavyakaBrahminsandTuluBrahminpeople,butwasoriginallyonlyappliedtowriteSanskrit.Thisscriptsplitintotwoscripts:TigalariandMalayalam.WhileMalayalamscriptwasextendedandmodifiedtowritethevernacularMalayalamlanguage,TigalariwasusedforSanskritonly.

  • 3

    In Malabar, this writing system was termed Arya-eluttu (ആര0എഴു,്, Ārya eḻuttŭ),meaning “Aryawriting”. (Sanskrit is an Indo-Aryan languagewhileMalayalam is a Dravidianlanguage).

    Vatteluttuwasingeneraluse,butwasnotsuitableforliteratureinwhichmanySanskritwordswereused.LikeTamil-Brahmi,itwasoriginallyusedtowriteTamil,andassuch,didnothaveletters for thevoicedoraspiratedconsonantsused inSanskritbutnotused inTamil.For thisreason, Vatteluttu and the Grantha alphabetwere sometimesmixed, as in theManipravalamliterature (a literary style used in medieval liturgical texts in South India). One of the oldestexamplesofthis,Vaishikatantram(ൈവശികത78ം,Vaiśikatantram),datesbacktothe12thcentury,where theearliest formof theMalayalamscriptwasused,but it seemstohavebeensystematizedtosomeextentbythefirsthalfofthe13thcentury.

    ThunchaththuEzhuthachan,apoetfromaroundthe17thcentury,usedArya-eluttutowritehisMalayalampoemsbasedonClassicalSanskritliterature.ForafewlettersmissinginArya-eluttu(ḷa,ḻa,ṟa),heusedVatteluttu.HisworksbecameunprecedentedlypopulartothepointthattheMalayalipeopleeventuallystartedtocallhimthefatheroftheMalayalamlanguage,whichalsopopularized Arya-eluttu as a script to write Malayalam. However, Grantha did not havedistinctionsbetweene andē, andbetweeno andō, as itwasonlyused towrite theSanskritlanguage.TheMalayalamscriptas it is todaywasmodified in themiddleof the19thcenturywhenHermannGundertinventedthenewvowelsignstodistinguishthem.

    Bythe19thcentury,oldscriptslikeKolezhuthuhadbeensupplantedbyArya-eluttu–thatisthecurrentMalayalamscript.Nowadays,itiswidelyusedinthepressoftheMalayalipopulationinKerala.

    Malayalam and Tigalari are sister scripts descended from the Grantha alphabet. Both sharesimilarglyphicandorthographiccharacteristics.

    3.4 OrthographyreformIn 1971, the Government of Kerala reformed the orthography of Malayalam by passing agovernmentordertotheeducationdepartment.Theobjectivewastosimplifytheuseofprintandtypewriting technologyof that time,byreducingthenumberofglyphsrequired. In1967,the government appointed a committee headed by Sooranad Kunjan Pillai the editor of theMalayalamLexiconproject. It reduced thenumberof glyphs required forMalayalamprintingfrom around 1000 to around 250. The above committee's recommendations were furthermodifiedbyanothercommitteein1969[105].

    Noneof themajornewspapers implemented itcompletely.Buteverynewspapertook itsownsubset from theproposal. The reformed script came into effect on15April 1971 (theKeralaNewYear),byagovernmentorderreleasedon23March1971.

    3.5 LanguagesusingtheMalayalamscriptThe script is also used towrite several other languages such as Paniya, Betta Kurumba, andRavula (all at EGIDS 5). The Malayalam language itself was historically written in severaldifferentscripts.

  • 4

    NBGPconsideredlanguageswithEGIDSscale1to4forinclusion.Malayalamisoneofthetwolanguages written in Malayalam script (viz Malayalam & Sanskrit) meeting this criterion.Malayalamisplacedamongthe22scheduledlanguagesofIndia.Sanskrit,althoughitfallsunderEGIDS4,isnotconsideredinMalayalamscriptLGRbecauseMalayalamisrarelyusedtowriteSanskrit.

    3.6 ZWJ/ZWNJApart from the existingUnicode character codepoints inMalayalam [110], ZeroWidth Joiner(ZWJ, U+200D) and ZeroWidth Non-Joiner (ZWNJ, U+200C) arewidely used to control howligatures are formed. Being invisible characters, they are often removed while doingnormalization, particularly before doing a string comparison, or collation. ICANN's MaximalStartingRepertoire(MSR)forIDNLGRisdoesnotincludeZWJandZWNJ.[101]

    Impactofexcludingthemfromdomainnamesystem:AlthoughIDNA2008allowstheuseofZWJandZWNJindomainnames,theyarenotallowedintherootzonelabels,duetoexclusionfromMSR.

    HenceitisnotpossibletoregisterMalayalamgTLDswithwordsthatcontainzwj/zwnj.

    Therearethreecases:

    ● MissingZWNJisconsideredasaspellingmistake.Example:TamilNadu(tamiɭnadu)iswrittenas: തമി9നാ;[0D240D2E0D3F0D340D4D200C0D280D3E0D1F0D4D](correct),[0D240D2E0D3F0D340D4D0D280D3E0D1F0D4D](incorrect).

    ButtherearenoidentifiedcaseswhereamissingZWNJformsanothervalidwordwithdifferentmeaning.

    ● MissingZWJmeans, theword isadifferentwordwithdifferentmeaning.This isveryrare – vaNyavanika (meaning: large curtain) വന0വനികvanyaVanika (meaning: wild garden) pair is often cited as an example for this. Butmanypeoplearguethisisnotavalidcase.[102][103]

    ● MissingZWJnevermeans a spellingmistake, but just awriting style.There aremanyexamplesforthis.-ന"(meaning:goodness)isoneobviousone.

    Historically, ZWJwas used to render chillu in certain fonts but later Unicode included chillucharacters as standalone code points and MSR-4 also includes these standalone chillucharacters.

    Pre-Unicode5.0,ChilluletterswereencodedasasequenceusingJoiners.Theolderencodingisstillprevalentindata,suchascorporaandmayevenbeincurrentuse.

    ButthislegacyrepresentationofChilluusingViramaandZWJisruledoutbecausetherootdoesnotallow joiners, so there isno issuewith theduplicateencodingofChillu.Hence, it is tobenoted that although atomic encoding of Chillu letters is not universally used, Root Zone onlyallowstheatomicencoding.

  • 5

    Figure1:AtomicEncodingMalayalamChillus[107]

    ZWNJ,isusedtopreventtheformationofconjunctligaturesanditisrequiredtoavoidspellingmistakesandunnecessaryconjuncts.Forexample, ina2-word label, the firstwordending inviramacanformconjunctwiththesecondwordstartinginaconsonant.Thiscausesaspellingmistake.

    3.7 TheStructureofMalayalamScriptTheMalayalamAksharamorgraphemeclusterisbasedontheMalayalamphonologicalsystem,withthefollowingbasicphonologicaltemplate.PhonologyVowels:Malayalamhasfiveshortandfivelongvowels.Vowelsoccurinallpositionsinaword,exceptforo,whichisnotpermittedattheendofit.Italsohastwodiphthongs,ai,au.

    Figure2:MalayalamVowelPhonology[109]

    Consonants: Besides a Dravidian consonantal inventory, Malayalam has aspirated stops andsupplementarysibilantsborrowedfromIndo-Aryan.[f]occursmostlyinEuropeanborrowings.Voiceless unaspirated stops, nasals and laterals [l], [ɭ] can be germinated. The distinctionbetweensingleandgeminatedconsonantsisphonemic.Onlysixconsonants,[m],[n],[ɳ],[r],[l],and[ɭ],canoccurwordfinally.

  • 6

    Figure3:MalayalamConsonantPhonology[109]

    Sandhi: internal and external sandhi are commonplace. They result in vowel and consonantdeletion,assimilationofconsonantsandfusion.Stress:itfallsalwaysonthefirstsyllableofawordScriptandOrthographyMalayalam is written in an abugida script derived ultimately from Brāhmī in which everyconsonant carries an inherent a. The alphabetic order is based on phonological principles: itbeginswiththesimplevowelsanddiphthongsfollowedby25stopsandnasalsarrangedinfivegroupsaccordingtotheirplaceofarticulation.Itcontinueswithsemivowels(liquidsandglides)and fricatives toend in tworetroflex liquidswhichdon'texist inSanskritand, thus,werenotrepresentedinBrāhmī.Geminatedconsonantsandotherconsonantclustersarewrittensidebysideoroneabovetheother. Below eachMalayalam sign appears the standard transliteration in the Latin alphabet,andbetweensquarebracketsitsequivalentintheInternationalPhoneticAlphabet.The followingsectionsprovidedetailsof theMalayalamsoundsandhowthesearewritten inMalayalam.Monophthongs

    Short Long

    Independent

    Dependent Independent

    Dependent

    Vowelsign

    Example Vowelsign

    Example

    a അ a /a/

    (none) പ pa /pa/

    ആ ā /aː/

    പാ pā /paː/

    i ഇ i /i/

    ി

    പി pi /pi/

    ഈ ī /iː/

    പീ pī /piː/

  • 7

    u ഉ u /u/

    പു pu /pu/

    ഊ ū /uː/

    പൂ pū /puː/

    r̥ ഋ r̥ /rɨ/

    പൃ pr̥ /prɨ/

    e എ e /e/

    പെ pe /pe/

    ഏ ē /eː/

    പേ pē /peː/

    o ഒ o /o/

    പൊ po /po/

    ഓ ō /oː/

    പോ pō /poː/

    Diphthongs

    Independent Dependent

    Vowelsign Example

    ai ഐai/ai/̯

    ൈ◌

    ൈപpai/pai/̯

    au ഔau/au̯/

    െ◌ൗ(archaic)

    െപൗpau/pau̯/

    ◌ൗ(modern)

    പൗpau/pau̯/

    Anusvaram

    aṁ

    അം aṁ /am/

    ം ṁ /m/

    പം paṁ /pam/

    Visargam

    aḥ

    അഃ aḥ /ah/

    ഃ ḥ /h/

    പഃ paḥ /pah/

    Consonants

    Voiceless Voiced

    Unaspirated

    Aspirated

    Unaspirated

    Aspirated

    Nasal

    Velar കka/ka/KA

    ഖkha/kʰa/KHA

    ഗga/ɡa/GA

    ഘgha/ɡʱa/GHA

    ങṅa/ŋa/NGA

  • 8

    Palatalor

    Postalveolar

    ചca/t͡ʃa/CAcha

    ഛcha/t͡ʃʰa/CHAchha

    ജja/ɟa/JA'"jha"'

    ഝjha/ɟʱa/JHA'"jhha"'

    ഞña/ɲa/NYAnha(nja)

    Retroflex ടṭa/ʈa/TTAta(hardta)

    ഠṭha/ʈʰa/TTHAtta(hardtha)

    ഡḍa/ɖa/DDAda (hardda)

    ഢḍha/ɖʱa/DDHAdda(harddha)

    ണṇa/ɳa/NNAhardna

    Dental തta/t̪a/TAtha (softta)

    ഥtha/t̪ʰa/THAttha(softtha)

    ദda/d̪a/DAdha (softda)

    ധdha/d̪ʱa/DHAddha(softdha)

    നna/n̪a,na/NAsoftna

    Labial പpa/pa/PA

    ഫpha/pʰa/PHA

    ബba/ba/BA

    ഭbha/bʱa/BHA

    മma/ma/MA

    Otherconsonants

    യya/ja/YA

    രra/ɾa/RA

    ലla/la/LA

    വva/ʋa/VA

    Dentalnasaloralveolarnasal,dependingontheword

    Alveolartap

    Thetipofthetonguealmosttouchestheteeth([l]̪),forwardthantheEnglishl

    ശśa/ʃa/SHAsoftsha(sha)

    ഷṣa/ʂa/SSAsha(hardsha)

    സsa/sa/SA

    ഹha/ɦa/HA

    Voicelessapico-palatalapproximant

    Dentalsibilantfricative

    ളḷa/ɭa/LLAhardla

    ഴḻa/ɻa/LLLA/ṛ/ɽ/zha(retroflexedra)

    റṟa,ṯa/ra,ta/RRA(hardra)

  • 9

    Apico-palatal Voicedapico-palatalapproximant[ʐ̠̺˕].Thisconsonantisusuallydescribedas/ɻ/,butalsocanbeapproximatedby/ɹ/

    alveolartrill(apical)

    [f] is found mostly in Urdu and English loanwords and doesn't have a specific sign; it isrepresentedwithphthatalsoservesfor[pʰ].Vowels

    Vowelsarewritteninthisformwhentheyareindependentlyused.

    അ U+0D05 A

    ആ U+0D06 AA

    ഇ U+0D07 I

    ഈ U+0D08 II

    ഉ U+0D09 U

    ഊ U+0D0A UU

    ഋ U+0D0B R

    എ U+0D0E E

    ഏ U+0D0F EE

    ഐ U+0D10 AI

    ഒ U+0D12 O

    ഓ U+0D13 OO

    ഔ U+0D14 AU

    Table1:MalayalamVowelsVoweldiacritics

    Vowels canalsobewrittenasdiacritics referred toasMatras,when these followconsonants.Theirformsaregivenbelow,illustratedwiththeletterക(U+0D15)MALAYALAMLETTERKA.

    ക U+0D15 KA

    കാ U+0D15 U+0D3E KAA

    കി U+0D15 U+0D3F KI

    കീ U+0D15 U+0D40 KII

    കു U+0D15 U+0D41 KU

    കൂ U+0D15 U+0D42 KUU

    കൃ U+0D15 U+0D43 KR

    െക U+0D15 U+0D46 KE

    േക U+0D15 U+0D47 KEE

    ൈക U+0D15 U+0D48 KAI

    െകാ U+0D15 U+0D4A KO

    േകാ U+0D15 U+0D4B KOO

    കൗ U+0D15 U+0D57 KAU

    Table2:MalayalamVowelDiacriticsConsonants

    Malayalam has the following consonants, generally arranged my manner and place ofarticulation.

    ക U+0D15 KA

    ഖ U+0D16 KHA

    ഗ U+0D17 GA

    ഘ U+0D18 GHA

    ങ U+0D19 NGA

  • 10

    ച U+0D1A CA

    ഛ U+0D1B CHA

    ജ U+0D1C JA

    ഝ U+0D1D JHA

    ഞ U+0D1E NYA

    ട U+0D1F TTA

    ഠ U+0D20 TTHA

    ഡ U+0D21 DDA

    ഢ U+0D22 DDHA

    ണ U+0D23 NNA

    ത U+0D24 TA

    ഥ U+0D25 THA

    ദ U+0D26 DA

    ധ U+0D27 DHA

    ന U+0D28 NA

    പ U+0D2A PA

    ഫ U+0D2B PHA

    ബ U+0D2C BA

    ഭ U+0D2D BHA

    മ U+0D2E MA

    യ U+0D2F YA

    ര U+0D30 RA

    റ U+0D31 RRA

    ല U+0D32 LA

    ള U+0D33 LLA

    ഴ U+0D34 LLLA

    വ U+0D35 VA

    ശ U+0D36 SHA

    ഷ U+0D37 SSA

    സ U+0D38 SA

    ഹ U+0D39 HA

    Table3:MalayalamConsonantsAnusvaramandVisargam

    Anusvaram:Ananusvaram(അനുസkാരംanusvāram),orananusvara,originallydenotedthenasalization where the preceding vowel was changed into a nasalized vowel, and hence istraditionallytreatedasakindofvowelsign.InMalayalam,anusvararepresentedas◌ം(0D02)however,simplyrepresentsaconsonant/m/afteravowel,thoughthis/m/maybeassimilatedtoanothernasalconsonant.Itisaspecialconsonantletter,differentfroma"normal"consonantletter,inthatitisneverfollowedbyaninherentvoweloranothervowel.Ingeneral,ananusvaraattheendofawordinanIndianlanguageistransliteratedasṁinISO15919,butaMalayalamanusvaraattheendofawordistransliteratedasmwithoutadot.

    Visargam:Avisargam(വിസർഗം,visargam),orvisarga,representsaconsonant/h/afteravowel,andistransliteratedasḥ.Liketheanusvara,itisaspecialsymbol,andisneverfollowedbyaninherentvoweloranothervowel.InMalayalam,◌ഃ(0D03)isthevisargasymbol.

    Chilluletters(Chillaksharam)andSamvruthokarams

    In the Indo-European family of languages like Sanskrit, a large number of words end inconsonants. But inDravidian languages likeMalayalam themajority ofwords end in vowels.

  • 11

    But,thechillaksharamsofMalayalamareexceptionstothisgeneralfeature.Chillaksharamsarepureconsonants,withoutanyvowelsound.[111]

    ChillaksharamisanoriginalfeatureofMalayalamusedonlywith6consonantsatpresent.Theconsonants areന (na),ണ (ṇa),ര (ra),ല (la)ള (ḷa) andക (ka) and their correspondingchillusare ൻ (ṉ), ൺ (ṇ), ർ (r), ൽ (l) ൾ (ḷ) and ൿ (ḳ)incertaincontexts,occurringattheendofthewordwithouttheimplicitvowel.TheChillu0D7Feventhoughisrare,isstillinusepredominantly in religious literature and in proper nouns such as names and place names.HenceitisincludedintheLGRtotreatChillucharactersconsistently.

    ൺ U+0D7A NN

    ൻ U+0D7B N

    ർ U+0D7C RR

    ൽ U+0D7D L

    ൾ U+0D7E LL

    ൿ U+0D7F K

    Table4:MalayalamChilluletters

    Samvruthokaram is a soft ending virama (chandrakkala). Any consonant can be followed byconsonant+◌ു(0D41)+◌◌്(0D4D),creatingthesamvruthokaramformofthatconsonant.InsouthernKerala,theUmatra◌ു(0D41)andchandrakkala(virama)◌◌് (0D4D)togetherformthe grapheme for samvruthokaram. However, in northern Kerala, just chandrakkala (visiblevirama)standingaloneisused.Inthatcase,chandrakkalaaloneattheendofawordistreatedasSamvruthokaram.

    Chandrakkala comingwithin aword (followed by other character(s) of theword) denotes aconjunct letter formed by the character(s) preceding and following the chandrakkala.Traditional Orthography fonts is used below, since it discusses display forms such assamvruthokaram,whichdoesnotexistinModernOrthography.

    ExamplesofSamvruthokaram:

    /ഏതു്

    (ethumeaningwhich),codepoints-U+0D0FU+0D24U+0D41U+0D4D

    /അതു്

    (athumeaningthat)codepoints-U+0D05U+0D24U+0D41U+0D4D

    For thewords thatend inchillu, Samvruthokaram isused tomake thepronunciationclearer.Eithersamvruthokaramisaddeddirectlytotheword-endingchillaksharam,ortheword-endingchillaksharamisgeminatedandSamvruthokaramisaddedtoit.

    Thefollowingarethemainphonologicaltransformationsofchillaksharam.[113]

    1.Theword-endingconsonantwrittenaschillaksharam,isgeminatedandasamvrukthokaramisattached:

  • 12

    2.Totheword-endingconsonantwrittenaschillaksharam,asamvrukthokaramisattached:

    3.Thechillaksharamundergoesthesamephonologicalchanges(inprogressive/regressive

    assimilation,gemination,etc.)asinthecaseofotherconsonantsinthecontextofcombination

    ofsyllables:

    4.Insandhi,whenavowelfollowsachillaksharam,theyjoininthesamewayaswhenvowels

    followotherconsonants:

    EventhoughSamvruthokarammaybeseenasderivedfromthevowelsഅ(a)orഉ(u),infact,ithasanindependentidentityasavowel.ThisfeatureisseenonlyinMalayalam.[111]Aselectionofconjunctconsonants

    AconsonantcanbecombinedwithanotherconsonantorconjunctusingVirama.Conjunctswithmorethanfourconsonantsarerare.Theconjunctt7u0isformedbyfiveconsonants.

    kka ṅka ṅṅa cca ñca ñña ṭṭa ṇṭa ṇṇa tta nta nna

    NLF vക wക wങ xച

    yച yഞ ;ട zട zണ {ത |ത |ന

  • 13

    LF } ~ ) , 8

    Table5:MalayalamConjunctConsonants

    NLF-Non-ligatedformhasavisiblevirama(chandrakkala)LF-Ligatedforminwhichconsonantsareconjoinedfullyorpartially(asrenderedbyfonts)Conjunctswithdiacriticsusingയ(U+0D2F),ര(U+0D30),ല(U+0D32),വ(U+0D35)

    Conjunctconsonantsformedwithയ(0D2F),ര(0D30),ല(0D32)andവ(0D35)arerenderedwithdiacriticmarks/signsintheglyph.Examplesoftheseincombinationwithക(0D15)andപ(0D2A)aregivenbelow.Otherconsonantscanbecombinedinsimilarfashion.

    Consonant + യ Consonant + ര Consonant + ല Consonant + വ

    ക0 (0D15 0D4D 0D2F)

    7ക (0D15 0D4D 0D30)

    (0D15 0D4D0D32)

    കk (0D15 0D4D 0D35)

    പ0 (0D2A 0D4D 0D2F)

    7പ (0D2A 0D4D 0D30)

    (0D2A 0D4D0D32)

    പk (0D2A 0D4D 0D35)

    Table6:MalayalamConjunctswithdiacriticsusingയ(U+0D2F),ര(U+0D30),ല(U+0D32),വ(U+0D35)

    4. OverallDevelopmentProcessandMethodologyTheNeo-BrahmiGenerationPanel(NBGP)hasbeenformedfrommembershavingexperienceinlinguistics and computational linguistics. Under the Neo-Brahmi Generation Panel, there areninescriptsbelongingtoseparateUnicodeblocks.Eachofthesescripts isassignedaseparateLGR;howeverNeo-BrahmiGPensuresthatthefundamentalphilosophybehindbuildingthoseLGRsareallinsyncwithallotherBrahmi-derivedscripts.

    The Malayalam script LGR proposal was published for public comment to allow those who had not participated in the NBGP to make their views known. The NBGP analyzed all comments received to finalize the proposal. The analysis of public comments can be accessed online given at [114].

    This LGR proposal was originally published on April 22, 2019. It has been updated andpublished for the second round of public comment on 26 March 2020 to correct aninconsistencyinvolvingthesupportforconjunct“nta”andtoaddressnewcross-scriptvariantsforLGR-4.

    4.1GuidingPrinciples

    TheNBGP adopts the followingbroadprinciples for the selection of code-points in the code-pointrepertoireacrosstheboardforallthescriptswithinitsambit.

  • 14

    4.1.1 4.1.1Inclusionprinciples:

    4.1.1.1Modernusage:Everycharacterproposedshouldbeintheeverydayusageofaparticularlinguisticcommunity.CharacterswhichhavebeenencodedinUnicodefortranscriptionpurposesonlyorforarchivalpurposeswillnotbeconsideredforinclusioninthecode-pointrepertoire.

    4.1.1.2Unambiguoususe:Every character proposed should have unambiguous understanding among the linguisticcommunityaboutitsusageinthelanguage.

    4.1.2 4.1.2Exclusionprinciples:Themainexclusionprinciple is thatofExternalLimitsonScope.ThesecompriseprotocolsorstandardswhichareprerequisitestotheLabelGenerationRulesets.Allfurtherprinciplesareinfactsubsumedundertheselimitationsbuthavebeenspeltoutseparatelyforthesakeofclarity.

    4.1.2.1ExternalLimitsonScope:The code point repertoire for root zone being a very special case, at the top of the protocolhierarchies,therangeofavailablecharactersforselectionasapartoftheRootZonecodepointrepertoire is already constrained by various protocol layers beneath it. The following threemainprotocols/standardsactassuccessivefilters:i.TheUnicodeChart:Outofallthecharactersthatareneededbythegivenscript,ifthecharacterinquestionisnotencodedinUnicode,itcannotbeincorporatedinthecodepointrepertoire.Suchcasesarequiterare, given the elaborate and exhaustive character inclusion efforts made by the UnicodeConsortium.ii.IDNAProtocol:Unicode being the character encoding standard for providing the maximum possiblerepresentation of a given script/language, it has encoded as far as possible all the possiblecharacters needed by the script. However, the Domain name being a specialized case, it isgoverned by an additional protocol known as IDNA (Internationalized Domain Names inApplications). The IDNA protocol introduces exclusion of some characters out of Unicoderepertoirefrombeingpartofthedomainnames.iii.MaximalStartingRepertoire:TheRoot-zoneLGRbeingarepertoireofthecharacterswhicharegoingtobeusedforcreationof therootzoneTLDs,which in turnareanevenmorespecializedcaseofdomainnames, theROOTLGRprocedureintroducesadditionalexclusionsonIDNAallowedsetofcharacters.Example: MALAYALAMSIGNAVAGRAHA"ഽ "(U+ 0D3D)evenifallowedbyIDNAprotocol,is

    notpermittedintheRootZoneRepertoireasperthe[MSR].Tosumup,therestrictionsstartoffbyadmittingonlysuchcharactersasarepartofthecode-blockofthegivenscript/language.This is furthernarroweddownbytheIDNA2008Protocol

  • 15

    andfinallyanadditionalfilterintheformofMaximalStartingRepertoirerestrictsthecharactersetassociatedwiththegivenlanguageevenmore.

    4.1.2.2NoRareandObsoleteCharacters:There are characters which have been added to Unicode to accommodate rare forms likeMALAYALAM LETTER VOCALIC L "ഌ" (U+0D0C), which is an obsolete vowel used to writeSanskritwordsand isnotconsideredaspartof themodernMalayalamorthography.All suchcharacterswillnotbe included.This is inconsonancewith theConservatismprincipleas laiddownintheRootZoneLGRprocedure.

  • 16

    5. RepertoireBasedontheLGRProcedurefortheRootZoneandtheMSR,NBGPconductedthecodepointanalysisoftheMalayalamscript.Theanalysisispresentedinthissection,includingthelistofcodepointsrecommendedforinclusionandexclusionfromtherepertoire.

    5.1 MalayalamsectionofMaximalStartingRepertoire[MSR]Version4

    Figure4:MalayalamCodePagefrom[MSR]

    Color convention1: All characters that are included in the [MSR] - Yellow background PVALID in IDNA2008 but excluded from the [MSR] - Pinkish background Not PVALID in IDNA2008 - White background

    1This document needs to be printed in color for this to be read correctly.

  • 17

    5.2 UnicodeCodePointsInclusionThefollowingcodepointsareincludedintherepertoire.

    Sr. No.

    Unicode Code Point

    Glyph Character Name Category Refs.

    1 0D02 ◌ം MALAYALAM SIGN ANUSVARA Anusvaram [106]

    2 0D03 ◌ഃ MALAYALAM SIGN VISARGA Visargam [106]

    3 0D05 അ MALAYALAM LETTER A Vowel [106]

    4 0D06 ആ MALAYALAM LETTER AA Vowel [106]

    5 0D07 ഇ MALAYALAM LETTER I Vowel [106]

    6 0D08 ഈ MALAYALAM LETTER II Vowel [106]

    7 0D09 ഉ MALAYALAM LETTER U Vowel [106]

    8 0D0A ഊ MALAYALAM LETTER UU Vowel [106]

    9 0D0B ഋ MALAYALAM LETTER VOCALIC R Vowel [106]

    10 0D0E എ MALAYALAM LETTER E Vowel [106]

    11 0D0F ഏ MALAYALAM LETTER EE Vowel [106]

    12 0D10 ഐ MALAYALAM LETTER AI Vowel [106]

    13 0D12 ഒ MALAYALAM LETTER O Vowel [106]

    14 0D13 ഓ MALAYALAM LETTER OO Vowel [106]

    15 0D14 ഔ MALAYALAM LETTER AU Vowel [106]

    16 0D15 ക MALAYALAM LETTER KA Consonant [106]

    17 0D16 ഖ MALAYALAM LETTER KHA Consonant [106]

    18 0D17 ഗ MALAYALAM LETTER GA Consonant [106]

    19 0D18 ഘ MALAYALAM LETTER GHA Consonant [106]

    20 0D19 ങ MALAYALAM LETTER NGA Consonant [106]

    21 0D1A ച MALAYALAM LETTER CA Consonant [106]

    22 0D1B ഛ MALAYALAM LETTER CHA Consonant [106]

    23 0D1C ജ MALAYALAM LETTER JA Consonant [106]

  • 18

    Sr. No.

    Unicode Code Point

    Glyph Character Name Category Refs.

    24 0D1D ഝ MALAYALAM LETTER JHA Consonant [106]

    25 0D1E ഞ MALAYALAM LETTER NYA Consonant [106]

    26 0D1F ട MALAYALAM LETTER TTA Consonant [106]

    27 0D20 ഠ MALAYALAM LETTER TTHA Consonant [106]

    28 0D21 ഡ MALAYALAM LETTER DDA Consonant [106]

    29 0D22 ഢ MALAYALAM LETTER DDHA Consonant [106]

    30 0D23 ണ MALAYALAM LETTER NNA Consonant [106]

    31 0D24 ത MALAYALAM LETTER TA Consonant [106]

    32 0D25 ഥ MALAYALAM LETTER THA Consonant [106]

    33 0D26 ദ MALAYALAM LETTER DA Consonant [106]

    34 0D27 ധ MALAYALAM LETTER DHA Consonant [106]

    35 0D28 ന MALAYALAM LETTER NA Consonant [106]

    36 0D2A പ MALAYALAM LETTER PA Consonant [106]

    37 0D2B ഫ MALAYALAM LETTER PHA Consonant [106]

    38 0D2C ബ MALAYALAM LETTER BA Consonant [106]

    39 0D2D ഭ MALAYALAM LETTER BHA Consonant [106]

    40 0D2E മ MALAYALAM LETTER MA Consonant [106]

    41 0D2F യ MALAYALAM LETTER YA Consonant [106]

    42 0D30 ര MALAYALAM LETTER RA Consonant [106]

    43 0D31 റ MALAYALAM LETTER RRA Consonant [106]

    44 0D32 ല MALAYALAM LETTER LA Consonant [106]

    45 0D33 ള MALAYALAM LETTER LLA Consonant [106]

    46 0D34 ഴ MALAYALAM LETTER LLLA Consonant [106]

    47 0D35 വ MALAYALAM LETTER VA Consonant [106]

  • 19

    Sr. No.

    Unicode Code Point

    Glyph Character Name Category Refs.

    48 0D36 ശ MALAYALAM LETTER SHA Consonant [106]

    49 0D37 ഷ MALAYALAM LETTER SSA Consonant [106]

    50 0D38 സ MALAYALAM LETTER SA Consonant [106]

    51 0D39 ഹ MALAYALAM LETTER HA Consonant [106]

    52 0D3E ◌ാ MALAYALAM VOWEL SIGN AA Matra [106]

    53 0D3F ◌ി MALAYALAM VOWEL SIGN I Matra [106]

    54 0D40 ◌ീ MALAYALAM VOWEL SIGN II Matra [106]

    55 0D41 ◌ു MALAYALAM VOWEL SIGN U Matra [106]

    56 0D42 ◌ൂ MALAYALAM VOWEL SIGN UU Matra [106]

    57 0D43 ◌ൃ MALAYALAM VOWEL SIGN VOCALIC R Matra [106]

    58 0D46 െ◌ MALAYALAM VOWEL SIGN E Matra [106]

    59 0D47 േ◌ MALAYALAM VOWEL SIGN EE Matra [106]

    60 0D48 ൈ◌ MALAYALAM VOWEL SIGN AI Matra [106]

    61 0D4A െ◌ാ MALAYALAM VOWEL SIGN O Matra [106]

    62 0D4B േ◌ാ MALAYALAM VOWEL SIGN OO Matra [106]

    63 0D4D ◌് MALAYALAM SIGN VIRAMA Chandrakkala / Virama

    [106]

    64 0D57 ◌ൗ MALAYALAM AU LENGTH MARK Matra [106]

    65 0D7A ൺ MALAYALAM LETTER CHILLU NN Chillu Letters [106]

    66 0D7B ൻ MALAYALAM LETTER CHILLU N Chillu Letters [106]

    67 0D7C ർ MALAYALAM LETTER CHILLU RR Chillu Letters [106]

    68 0D7D ൽ MALAYALAM LETTER CHILLU L Chillu Letters [106]

    69 0D7E ൾ MALAYALAM LETTER CHILLU LL Chillu Letters [106]

    70. 0D7F ൿ MALAYALAM LETTER CHILLU K Chillu Letters [106]

    Table7:MalayalamCodePointRepertoire

  • 20

    5.3 CodePointSequencesThefollowingsequenceshavebeendefinedforthepurposeofvariantdefinitionsandWLErules

    (seesection6.1andsection7).

    1 U+0D28 U+0D4D (U+0D31) ന ◌് റ [|റ]

    MALAYALAM LETTER NA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

    2 U+0D31 U+0D31 റ റ [ററ]

    MALAYALAM LETTER RRA MALAYALAM LETTER RRA

    3 U+0D31 U+0D4D U+0D31 റ ◌് റ []

    MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

    4 U+0D31 U+0D31 U+0D4D U+0D31

    റ റ ◌് റ [റ]

    MALAYALAM LETTER RRA MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

    5 U+0D31 U+0D4D U+0D31 U+0D31

    റ ◌് റ റ [റ]

    MALAYALAM LETTER RRA MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA MALAYALAM LETTER RRA

    6 U+0D33 U+0D33 ള ള [ളള]

    MALAYALAM LETTER LLA MALAYALAM LETTER LLA

    7 U+0D33 U+0D4D U+0D33 ള ◌് ള []

    MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA

    8 U+0D33 U+0D33 U+0D4D U+0D33

    ള ള ◌് ള

    [ള]

    MALAYALAM LETTER LLA MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA

    9 U+0D33 U+0D4D U+0D33 U+0D33

    ള ◌് ള ള

    [ള]

    MALAYALAM LETTER LLA MALAYALAM SIGN VIRAMA MALAYALAM LETTER LLA MALAYALAM LETTER LLA

    10 U+0D7B (U+0D31) ൻ റ [ൻറ]

    MALAYALAM LETTER CHILLU N MALAYALAM LETTER RRA

    11 U+0D7B U+0D4D (U+0D31) ൻ ◌് റ [ൻ◌്റ]

    MALAYALAM LETTER CHILLU N MALAYALAM SIGN VIRAMA MALAYALAM LETTER RRA

    Table7a:MalayalamCodePointSequences

  • 21

    The code point (U+0D31) where shown in parentheses is replaced by an equivalent context rule when(followed-by-0D31) in the actual implementation of these sequences in the XML data file. See Section 6.1.

    5.4 UnicodeCodePointExclusionThe following code points are excluded because they are archaic or obsolete in currentMalayalamorthography.

    Sr. No.

    Unicode Code Point

    Glyph Character Name Category Reason

    1. 0D0C ഌ MALAYALAM LETTER VOCALIC L

    Vowel ഌ (0D0C) an obsolete vowel used to write Sanskrit words. The letter ഌ is very rare, and are not considered as part of the modern Malayalam orthography.

    2. 0D44 ◌ൄ MALAYALAM VOWEL SIGN VOCALIC RR

    Matra ◌ൄ (0D44) is the matra sign of obsolete vowel VOCALIC RR ൠ (0D60) which is not among the approved codepoints in MSR-4. It is no longer used in Malayalam orthography.

    3. 0D29 ഩ MALAYALAM LETTER NNNA

    Consonant ഩ (0D29) corresponds to Tamil ṉa ன. Used rarely in scholarly texts to represent the alveolar nasal, as opposed to the dental nasal. [108]. In ordinary texts both are represented by na ന (0D28).

    Table8:MalayalamExcludedCodePoint

  • 22

    6. VariantsThissectiondiscussesthevariantcodepointsfoundinMalayalamwithinscriptandwithotherrelatedscripts.

    6.1 In-scriptvariantsThissectionlistssequencesthatshouldbeconsideredvariantsofoneanother.

    Set # Characters Code Points Glyph

    1. a) | + റ 0D28 +0D4D +0D31 or

    b) ൻ + ◌് + റ 0D7B + 0D4D + 0D31

    c) ൻ + റ 0D7B + 0D31 ൻറ

    2. a) + ള 0D33 + 0D4D + 0D33

    b) ള + ള 0D33 + 0D33 ളള

    3. a) റ + റ 0D31 + 0D31 ററ

    b) റ + ◌് + റ 0D31 + 0D4D + 0D31 or

    Table9:In-scriptVariantAnalysisSet1:Thesearevariouswaystowritetheconjunct“nta”inMalayalam.1a)Herentaisencodedasacombinationof0D28+0D4D+0D31anditisrenderedas in most of the MalayalamUnicodefontsandafewoftheMicrosoftfontsrenderitas|റ.

    1b)ishowsomeMicrosoftfontshaveencodednta0D7B+0D4D+0D31anditisrenderedas inthosefontsandas inotherfonts.However,asperUnicode(StandardVersion11.0.0§12.9page506table12-38)istheprescribedsequencefortheform{chillu-nbase,rrabelow-base}.Although1.c)hasalsobeenusedhistoricallytowritentaandsuchsequentialstyleofwritingisstillinuse,thatcombinationcanalsobeusedtowritenrainwordslikeെഹൻറി(Henry)or

    എൻറി}(Enrica).[112]Hencethesequenceof1.c)isallowed.

    Thevariantset1containsthethreesequenceswithdisposition“blocked”.Allthreevariantsequencesinset1endin0D31.Toavoidoverlapwiththevariousvariantsequencesbeginningwith0D31,theactualimplementationofthesevariantswilldropthe0D31fromtheendofthesesequences,butaddacontextrulewhen=”followed-by-0D31”instead.Thisimplementationis

  • 23

    equivalentasfarasvariantsgeneratedforthesequencesinset1,butunlikethenaïveimplementationiswell-behavedincaseswhereanyofthesequenceshereisfollowedby0D31or0D4D0D31.Thecontextruleforset1variantmapping:V1:Avariantisdefinedwhenfollowedby0D31.Set 2: The consonantള (0D33) rarely follows anotherള inMalayalam, except in the case of

    someplace names. The double conjunct ofള (0D33) formed by code points 0D33+ 0D4D+

    0D33isrenderedastheglyphwhichlooksvisuallyverysimilartoaളfollowinganotherള.

    Thiscanresultinspoofedlabels.Forexample,inMalayalamwewrite“vellam”as“െവം”-

    0D350D460D330D4D0D330D02(meaning:water),aspoofedlabelcanwriteitas“െവളളം”

    -0D350D460D330D330D02.Thisshouldbeblocked.

    However,thispatterngivesrisetosomecomplicationsbecauseiteffectivelymakestheHalant(0D4D)avariantofa"nullposition",inthiscase,wheneveritoccursbetweentwoinstancesof0D33ളLLA.Variantdefinitionsofthatnaturecanleadtounexpectedresultsbecausealabel:

    0D330D4D0D330D4D0D33canbeanalyzedtwoways:

    {0D330D4D0D33}{0D4D}{0D33}and{0D33}{0D4D}{0D330D4D0D33}

    NBGPtakesintoaccountthedataprovidedbytheIPonoccurrenceofthesesequencesincertainlabelswhereaconsonantള(0D33)followsanotherള:IPhadfoundthatthefrequencyissmall.

    However, the community feedback shows an increase in usage due to foreign-language-borrowedwordslanguage.ThedetailedanalysisandsupportingdatacanbefoundinAppendixC.

    Therefore, NBGP has decided to define a rule (rule 7 in Section 7). The sequences U+0D33U+0D33(ളള )/U+0D33U+0D4DU+0D33(ള്ള )andU+0D33U+0D33U+0D4DU+0D33(ളള്ള ) / U+0D33 U+0D4D U+0D33 U+0D33 ( ള്ളള ) have been defined as variant pairs.However, these sequences and variants are further constrained by context rules on bothsequences andvariants.Tomake the "null" variantwell-behaved,noneof the sequences, norU+0D33(ള),maybefollowedbyafurtherU+0D33.ThatlimitsalloccurrencesofU+0D33to

    singletonsorexplicitlyenumeratedsequences.Atthesametime,thevariantmappingsarenotdefinedifasequencefollowsU+0D33U+0D4DorfollowsU+0D4DU+0D33,inotherwords,ifitispartofalongersequenceof0D33(ള)joinedbyHalant.

    Ifareordrantmatrafollowsasequenceitwouldgraphicallyintervene,thusmakingthe

    sequencesnolongervariants.ReordrantmatrasareU+0D46(െ),U+0D47(േ),U+0D48(ൈ),U+0D4A(ൊ),U+0D4B(ോ),andasequenceU+0D4D( ്) U+0D30(ര).Therefore,thevariantsarealsonotdefinedifasequenceisfollowedbyareordrantmatra.Thesetwocontextrulesarecombinedintothesinglecontextonthevariantmapping:

  • 24

    V2:Avariantprecededby0D33+Halantorfollowedby0D33orRorHalant+0D33isnotdefinedThesequenceU+0D4D(◌്)U+0D30(ര)isnotrequiredinthenormativepartoftheproposal

    asitdoesn'tcreateanyconfusinglabel.Restrictitwillonlybethespellingrule.Set3:Thecaseofissimilarto.Afontthatdoesnotstacktheറ+◌്+റcanrenderitin

    horizontalformat.Soawordlikeമീററcanbespoofedbyapplyingviramatothelasttwoറ.

    Itisraretoseeafontthatdoesnotstack,butinsteadofdependingonthatweakassumption,

    sequencesandvariantshavebeendefined inanentirelyanalogousmannertoU+0D33withavariantcontext:V3: A variant preceded by 0D31+Halant or followed by 0D31 or R or Halant+0D31 is notdefined.(ThisisalsomentionedinAppendixpartofthedocumentascommunityfeedback.)

    6.2 Cross-ScriptVariantsThe Malayalam characters in tables below are considered variant code points with somecharacters in Oriya and Tamil as they could be considered visually same for the users. SeeAppendix A for additional code points for other scripts which are visually similar but notconsideredasvariantcodepointsforthereasonslisted.

    6.2.1 Cross-scriptvariantsforTamilandMalayalam

    Variant Set Tamil Malayalam

    CP Glyph CP Glyph

    1. 0B9C ஜ 0D1C ജ

    2. 0BB5 வ 0D16 ഖ

    3. 0BAE ம 0D25 ഥ

    4. 0BBF ◌ி 0D3F ◌ി

    5. 0BC6 ெ◌ 0D46 െ◌

    6. 0BC7 ே◌ 0D47 േ◌

    Table10:Tamil–MalayalamCrossScriptVariants

  • 25

    6.2.2 Cross-scriptvariantsforOriyaandMalayalamCaseofMalayalamandOdia(Oriya)TTHAConsonant:

    Thisisthecaseof"ConsonantTtha"whichhappenedtoretainthesameshapedespitebeing

    partofdifferentscripts,i.e.,MalayalamandOdia.Thesecharactersare:

    ഠ-MALAYALAMLETTERTTHA(U+0D20)

    ଠ-ORIYALETTERTTHA(U+0B20)

    Bothcharacterslookexactlyalikeandbelongtoa"Consonant"category.Astheyareconsonants,

    eachof them,even inthesimplest formi.e. thecharactersthemselves,arevalid labels.Asper

    theNBGPcross-script variant inclusionpolicy (AppendixD), this is a valid case for inclusion.

    Also,even if theyaresinglecharacters,whenthesamecharactercombines, theoretically they

    can forman infinite2numberofcross-scriptvariant labelsbetweenthescripts involved.Here

    aresamplesofsomeofthoselabels:

    Malayalam Oriya

    ഠഠഠ U+0D20 U+0D20 U+0D20

    ଠଠଠ U+0B20 U+0B20 U+0B20

    ഠഠഠഠ U+0D20 U+0D20 U+0D20 U+0D20

    ଠଠଠଠ U+0B20 U+0B20 U+0B20 U+0B20

    ഠഠഠഠഠ U+0D20 U+0D20 U+0D20 U+0D20 U+0D20

    ଠଠଠଠଠ U+0B20 U+0B20 U+0B20 U+0B20 U+0B20

    Since, having such labels is a realistic possibility and the corresponding labels look almostexactly alike,NBGP has proposed them (together with similar combining marks) as blockedvariants.

    Variant Set Oriya Malayalam

    CP Glyph CP Glyph

    1. 0B20 ଠ 0D20 ഠ

    Table11:Oriya–MalayalamCrossScriptVariants

    2Though theoretically infinite, this number would be limited to the number of such labels whose equivalent punycode string would not exceed 63 characters including the ACE prefix "xn--".

  • 26

    6.2.3 Cross-scriptvariantsforMyanmarandMalayalam

    Variant Set Myanmar Malayalam

    CP Glyph CP Glyph

    1. 1002 ဂ 0D31 റ

    2. 101D ဝ 0D20 ഠ

    3.3 1077 ၷ 0D31 റ

    Thesemappingshavenotbeenimplemented,seeSection0.

    6.2.4 Cross-scriptvariantsforGeorgianandMalayalam

    Variant Set Georgian Malayalam

    CP Glyph CP Glyph

    1. 10D8 ი 0D31 റ

    This mapping has not been implemented, see Section 0.

    6.2.5 Cross-scriptvariantsforsequencescontaining0D31These variant mappings for 0D31 (റ) affect any overlapped cross-script variant sequencecontaining0D31.Notethattheywouldposeparticularproblemsforthosesequencesendingin0D31whichhavebeendefinedherebysubstitutingthetrailing0D31withacontextrule.Fullyaccountingfortheseissueswouldaddconsiderablecomplexitytoanalreadycomplexdefinitionofvariants.Fortunately,thereisasimpleralternative.Because of all Malayalam code points only 0D31 maps to these cross-script code points(U+1002, U+1077 andU+10D8), only labels consisting entirely of 0D31 could become cross-script variant labels. Further, due to context rules defined for 0D31 the only two labels thatwould be possible without at least one other code point are “റ” and “ററ”. As a result, it isproposedtodisallowthesetwolabelswithaWLErule(seeSection7).

    3 This is due to the transitivity for a Myanmar in-script variant between 1002 and 1077.

  • 27

    7. WholeLabelEvaluation(WLE)RulesThissectionprovidestheWLErulesthatarerequiredbyallthelanguagesmentionedinSection4whenwritteninMalayalamScript.TheruleshavebeendraftedinsuchawaythattheycanbeeasilytranslatedintotheLGRspecifications.Below are the symbols used in the WLE rules, for each of the "Indic Syllabic Category" asmentionedinthetableprovidedforcodepointrepertoireinSection5.

    7.1.1 Variablesordefinitions

    V → VowelM → Matra(VowelSign)C → ConsonantL → ChilluH → Chandrakkala/Halant/Virama(◌◌്U+0D4D)B → Anusvaram(◌ംU+0D02)X → Visargam(◌ഃU+0D03)R → ReordrantMatra "R"isusedinvariantcontextsandpointthereadertoSection6.1fordetails.

    7.1.2 RulesforFormingAksharamRule1: HmustbeprecededbyCortheM◌ു(0D41)

    Rule2: MmustbeprecededbyC

    Rule3: BmustbeprecededbyC,VorM

    Rule4: XmustbeprecededbyC,VorM

    Rule5: LcannotbeprecededbyB,XorH

    Rule6: LabeldoesnotbeginwithL

    Rule7: Thecharacterള(0D33)cannotimmediatelyfollowള(0D33),exceptaspartofadefinedsequence

    Rule8:Thecharacterറ(0D31)cannotimmediatelyfollowറ(0D31),exceptaspartofadefinedsequence

    Rule9:SequencemustbeprecededbyC,L,M,Vandmustbefollowedbyറ(0D31)

    AsanexceptionofRule1,Rule9allowsthesequenceof(HfollowsL)whenthesequencefollowsC,L,MorV.Inaddition,thechangeoftorequiresacontextrule“followed-by-0D31”forthecodepointsequence.Therefore,thecombinedrule“when(follows-C-L-M-or-V-and-followed-by-0D31)”isappliedtothesequence.

    Rule10:Labelscannotbecomposedsolelyofറ(0D31)RRA

  • 28

    BecauseofRule8,andthedefinedsequence“ററ”,thereareonlytwolabelsaffectedbyRule10:thelabels“റ”and“ററ”.AsdescribedinSection6.2.5,thesearedisallowedtoavoidcomplicationsfromcross-scriptmappings.

    8. ContributorsNeo-BrahmiGenerationPanel(NBGP)VeenaSolomon([email protected])PrasadPattarumadomKesavaKurup([email protected])SanthoshThottingal([email protected])AnivarAravind([email protected])JijoPappachan([email protected])

    9. References[MSR] IntegrationPanel,"MaximalStartingRepertoire—MSR-4OverviewandRationale",7

    February2019https://www.icann.org/en/system/files/files/msr-4-overview-25jan19-en.pdf(Accessedon18thFebruary,2019)

    [EGIDS]ExpandedGradedIntergenerationalDisruptionScale,

    https://www.ethnologue.com/about/language-status(Accessedon5thJuly,2018)[101] Unicode®StandardAnnex#31MarkDavis,“UnicodeIdentifierAndPatternSyntax”:2.3

    LayoutandFormatControlCharactershttp://unicode.org/reports/tr31/#Layout_and_Format_Control_Characters(Accessedon5thJuly,2018)

    [102] “ReportonMalayalamUnicodeIssues”(2012)preparedbySanthoshThottingal(alsopartofNEGP)andsubmittedtoUnicodeviaWikimediaFoundation.Itdiscussesbothchilluandntaissues: http://thottingal.in/documents/ReportonMalayalamUnicodeIssues.pdf (Accessedon5thJuly,2018)

    [103] ഓളംDictionary,https://olam.in/ (Accessedon5thJuly,2018) [104] RoozbehPournaderandCibuJohny,“OldandNewChillusinMalayalamand

    implicationsforSinhala” http://www.unicode.org/L2/L2013/13036-chillus-uptake.pdf (Accessedon5thJuly,2018)

    [105] Wikipedia,“Malayalamscript” https://en.wikipedia.org/wiki/Malayalam_script (Accessedon5thJuly,2018)

    [106] Omniglot,“Malayalam(മലയാളം)” https://www.omniglot.com/writing/malayalam.htm (Accessedon5thJuly,2018)

    [107] TheUnicodeStandard,Version10.0.,Chapter12“SouthandCentralAsiaI:OfficialScriptsofIndia”, https://www.unicode.org/versions/Unicode10.0.0/ch12.pdf#page=65 (Accessedon5thJuly,2018)

    [108] Everson,Michael(2007)."ProposaltoaddtwocharactersforMalayalamtotheBMPoftheUCS"(PDF).ISO/IECJTC1/SC2/WG2N3494.Retrieved2009-09-09: http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3494.pdf (Accessedon5thJuly,2018)

    [109] AlejandroGutmanandBeatrizAvanzati“Malayalam,TheLanguageGulper” http://www.languagesgulper.com/eng/Malayalam.html (Accessedon5thJuly,2018)

    [110] MalayalamRange:0D00–0D7F,TheUnicodeStandard,Version11.0https://unicode.org/charts/PDF/U0D00.pdf (Accessedon5thJuly,2018)

  • 29

    [111] R.Chitrajakumar,N.GangadharanRachanaAksharaVedi“SamvruthokaramandChandrakkala”https://www.unicode.org/L2/L2005/05213-samvruktokaram.pdf(Accessedon2ndAugust,2018)

    [112] SanthoshThottingal,“|റ-ഭാഷ,യുണിേ}ാ,ചി7തീകരണം”https://blog.smc.org.in/nta-rendering-rules/(AccessedonAug2nd,2018]

    [113] R.Chitrajakumar,N.GangadharanRachanaAksharaVedi“Chillaksharamof MalayalamLanguage”https://unicode.org/L2/L2005/05214-chillu.pdf

    (Accessedon27thAugust2018)[114] PubliccommentfeedbackforMalayalam,TamilScriptLGRProposals,

    https://docs.google.com/document/d/1Am1qJXSYPpuUifcfUWT01uwCV-LCAe3XgBsnJvM5tHs/edit#heading=h.1k12tx1767k9(Accessedon18thFebruary2019)

  • 30

    10. AppendixA:ExcludedIn-ScriptVariantsAsthefollowingformationsarenotvalidasperAksharamformationrules,thesecasesarenotproposedasvariants.

    1. ഈ 0D08 ഈ

    ഇ + ◌ൗ 0D07 + 0D57 ഇ◌ൗ

    2. ഊ 0D0A ഊ

    ഉ + ◌ൗ 0D09 + 0D57 ഉ◌ൗ

    3. ഔ 0D14 ഔ

    ഒ + ◌ൗ 0D12 + 0D57 ഒ◌ൗ

    4. ഓ 0D13 ഓ

    ഒ + ◌ാ 0D12 + 0D3E ഒ◌ാ

    5. ഐ 0D10 ഐ

    എ + െ◌ 0D0E + 0D46 എെ◌

    TableA-1:ExcludedIn-ScriptVariantsDuetoInvalidCombination

    InTableA-2,Column1:Thesevowelsignshaveglyphpieceswhichstandonbothsidesoftheconsonant;theyfollowtheconsonantinlogicalorder,andshouldbehandledasaunitformostprocessing.Column2:Although,Unicodedefinesthiscanonicaldecomposition,theStandardrecommendsnottousethesequence[107],p501.Therefore,itisnotadvisabletousetheminIDNlabels;theyareblockedherebyaksharaformationrule.

    Code Point 1 + Glyph 1 Code Point 2 + Glyph 2

    െ◌ാ (0D4A) െ◌ (0D46) + ◌ാ (0D3E)

    േ◌ാ(0D4B) േ◌ (0D47) + ◌ാ (0D3E)

    ◌ൗ (0D57) െ◌ (0D46) + ◌ൗ (0D57)

    TableA-2:SplitVowelCase

  • 31

    11. AppendixB:ConfusableCodePointsThecode-pointsbelowarevisuallyconfusingonlyinsmallerfontsandcanbeexcludedfromconsiderationasvariantcodepoints.

    Tamil Malayalam

    ஸ (0BB8) സ (0D38)

    TableB-1:Tamil-MalayalamConfusableCodePoints

    Oriya Malayalam

    ଂ (0B02) ◌ം (0D02)

    ଃ (0B03) ◌ഃ (0D03)

    TableB-2:Oriya-MalayalamConfusableCodePoints

    AttheSriLankaface-to-facemeeting,itwasdecidedtoexcludethecodepointsbelowfromthevariantlistasthesedonotlookalike,duetoround/squarestructuraldifferences.

    Kannada Malayalam

    ಲ (0CB2) ല (0D32)

    TableB-3:Kannada-MalayalamConfusableCodePoints

    Telugu Malayalam

    ల (0C32) ല (0D32)

    TableB-4:Telugu-MalayalamConfusableCodePoints

    AspercommentreceivedfromMyanmarGPandoncloseexamination,thefollowingcodepointsareconsideredasconfusablewithMalayalam.

    Myanmar Malayalam

    က (1000) ന (0D28)

    ယ (101A) ധ (0D27)

    ကာ (1000 + 102C) ന്ന (0D28 + 0D4D + 0D28)

    TableB-5:Myanmar-MalayalamConfusableCodePoints

    Code points in Table B-6, B-7, and B-8would qualify as cross-script code point variants buttherearenotenoughofthemtoformavariantlabels,thereforethesecasescanbeexcluded.(If

  • 32

    onlycombiningmarksarevariants foragivenscript,no labelcanbeformedwithoutusingatleastonenon-variantcodepoint).InthecaseofSinhala,therelevantbasecharacterisdistinct.

    Kannada Malayalam

    ◌ಂ (0C82) ◌ം (0D02)

    ◌ಃ (0C83) ◌ഃ (0D03)

    TableB-6:Kannada-MalayalamTooFewIdenticalCodePoints

    Telugu Malayalam

    ం (0C02) ◌ം (0D02)

    ః (0C03) ◌ഃ (0D03)

    TableB-7:Telugu-MalayalamTooFewIdenticalCodePoints

    Sinhala Malayalam

    ◌ം (0D82) ◌ം (0D02)

    ◌ඃ (0D83) ◌ഃ (0D03)

    TableB-8:Sinhala-MalayalamTooFewIdenticalCodePoints

    NBGP also considers that 0D1F (ട) MALAYALAM LETTER TTA is similar to 0073 (s) LATIN SMALL LETTER S and 0455 (ѕ) CYRILLIC SMALL LETTER DZE. However, Latin script and Cyrillic script are not derived from the Brahmi script. This case is out of scope of NBGP cross script variant analysis.

    12. AppendixC:Caseofള(0D33)+ള(0D33)This appendix contains copies of all input related to the case of ള (0D33) + ള (0D33). For the adopted solution see (Section 6.1).

    The consonant ള (0D33) rarely follows another ള in Malayalam, except in the case of some place names. The double conjunct of ള (0D33) formed by code points 0D33 + 0D4D + 0D33 is rendered as the glyph which looks visually very similar to a ള following another ള. This can result in spoofed labels. For example, in Malayalam we write “vellam” as “െവം” - 0D35 0D46 0D33 0D4D 0D33 0D02 (meaning: water), a spoofed label can write it as “െവളളം” - 0D35 0D46 0D33 0D33 0D02.

    Combination Code points Glyph

    + ള 0D33 + 0D4D + 0D33

  • 33

    ള + ള 0D33 + 0D33 ളള

    TableC-1:Caseofള(0D33)+ള(0D33)

    This has been restricted by a WLE rule 7. It allows the combination “ള” (0D33 0D4D 0D33 0D33) which is present in words like “ഉള” (meaning: inner dimension viz. volume), and blocks the combination “ള” (0D33 0D33 0D4D 0D33) which is rarely found in usage. The existence of “ളള” (0D33 0D33 ) in considerable percentage on the web can be attributed to misspelling due to extreme visual similarity.

    ===================================================================

    ProposedrecommendationfromtheIntegrationPanel

    ===================================================================

    ProposedrecommendationforMalayalamDATE:2018-06-12

    Overview

    TheIPrecentlydiscoveredatechnicalissuewiththeproposedvariantsforMalayalam.

    IssueStatement

    TheMalayalamLGRdefinesthefollowingvariant

    0D330D330D330D4D0D33(i.e.:ളള)

    ThispatterngivesrisetosomecomplicationsbecauseiteffectivelymakestheHalant(0D4D)avariantofa"nullposition", inthiscase,whenever itoccursbetweentwoinstancesof0D33ള LLA. Variant definitions of that nature can lead to unexpected results because a label 0D330D4D0D330D4D0D33canbeanalyzedtwoways:

    {0D330D4D0D33}{0D4D}{0D33}and

    {0D33}{0D4D}{0D330D4D0D33}

    As a result of this, variant definitions of this nature, although seeminglywell-defined on thecodepointlevelcanleadtounexpectedvariantrelationsamonglabels.

    Therefore, such kinds of variant sequence definitions cannot be used without some furtherrestriction.BelowtheIPwillsuggesttwopossibleapproachesandrequeststhattheGPconsidertheminlightoftheknowledgeofhowthescriptisused.

    Background:

    LookingattheMalayalamsamplefiletheIPnotes:0D330D33ളള existsonce(1)insampleof60Klabels(it'spartofthelongerpattern:0D330D4D0D330D33orള)

    0D330D330D33(ളളള) exists(0)times

  • 34

    0D330D4D0D33() exists523times,or.9%ofthetotal;ofthese:

    ● 1/10or52arefollowedbyan0D4D(Halant):0D330D4D0D330D4D(്)

    ● none(0)isofthepattern0D330D4D0D330D4D0D33(orlonger)

    Fromthisonecanconclude:

    ● is quite frequent and can be spoofed byളള (which doesn't occur normally or at

    leastnotfrequently)

    ● ് alsooccurswithsomefrequencyandcouldbespoofedbyള (thelatteragainnot

    seeninthesample)

    ● ള doesoccur,ifrarely,andcanbespoofedbyള orളളള,butnotby്ള(wherethecodepointsare:0D330D4D0D330D33,0D330D330D4D0D33and0D330D4D0D330D4D0D33)

    UnderthedefinitionintheproposedLGRള andള arenotactuallyvariantlabelsofeachother,while്ള isavariantofള eventhough it shouldn'tbe. (Thereasonwhythe lastlabel shouldn't be a variant label is because the second halant would be rendered visibly,makingitdistinct.)

    Longerpatternsareeitherrareordonotoccurinstandardsample;theyseemquitelikelytobenonsensical (at least someof them). Therefore, the cases seen so farwould appear to be thetotalsetofcaseswherethereisapracticalneedforsomevariantsorotherrestriction.

    Options

    The IP identified two suggested options to resolve the issue.

    Option One

    Restrictingthevariantsoitcannotoccurfollowingan0D33ളorHalant.

    If thevariantcanbe limited to thebeginningofacluster, that is,a requirementadded that itonly applies when not following an 0D33 of 0D4D, then we can take still care of the mostfrequentandsecondmostfrequentcase,andthesecasesproducevariantlabelsthatarerelatedin expected ways: longer strings of alternating 0D33 and 0D4D pose no problems as anyalternategroupingofcodepointsintosequencesdoesnotleadtoanyadditionalvariants.Onlytheleading{0D330D33}or{0D330D4D0D33}wouldcausevariants.Inparticular്ള (with

    avisibleHalant)wouldnotbecomeavariantofള,etc.However,caseslikeള / ള /

    ളളള wouldstillnotfullyworkasintendedasthefirstandsecondlabelwouldnotbevariantsofeachother,andonlythefirstwouldbeavariantofthelast.

    OptionTwo

    Restrictingvalidlabelstoexcludeളള

  • 35

    Restrictinglabelsfromcontainingtwo0D33ള thatarenotjoinedbyaHalantwouldrobustlypreventanyspoofing.However,itwouldalsodisallowasmallnumberofpotentiallymeaningfullabels. (About 0.0015% of the words in the test file are affected - or 1 in 60K). No variantdefinitionwouldbeneeded.

    Recommendation

    The IP requests the NBGP to study these options and to consider them in determining aproposedapproachtofixingtheissuewiththekindofvariantmappingmentionedattheheadofthedocument.

    We realize that these represent a trade-off. For the Root Zone we feel comfortable thatrestrictionoftheallowedlabelstoavoidsomeproblemcasesisdefinitelyappropriate,eveniftheprocesscontainsaStringReviewphasethatwouldallowthemanualweedingoutofspecificbadcases.

    However,wefeelthatanoptionthatleavessome,ifrare,opportunitiesforspoofingmaywellbeinappropriateforthesecondandotherlevelsaswell: forthoselevels,humanoversightoftheprocessisgoingtobeevenlessavailable.

    The IPsuggests that theGPalsoweigh theextent towhichdecisions for theRootZoneaffectotherzones(byexample).

    ===================================================================

    Feedbackfromcommunity

    ===================================================================

    നീള&മുടി,neelallamudiishowpeoplesayനീളമുമുടി,neelamullamudi[meaning:longhair,lit.hairwithlength],locallyinValluvanadareaofNorthKerala.Similarly,

    നലതാളപാ)്,nalla thaalalla paattu, is the same asനലതാളമുപാ)്,nallathaalamullapaattu[meaning:(a)songwithgoodrhythm]

    െവകിണ,vellallakinaru,isെവമുകിണ,vellamullakinaru[meaning:(a) well with water]This label is not blocked because ള is allowed.

    I don't think these need to be considered, as theള&partin these labels is aspokencontractionofഉ&,ulla[meaning:having,with].

    Inotherparts ofKerala, the spokendialect changes the contraction to "െളാ" orേളാwhichareallowedaspertherule.

    Then there are someplace names likeമാള&.On doing aGoogle search, I got only asingleresult[google.co.in].

  • 36

    Feedbackfromthecommunity:

    I won't recommend adding such rules based on the existence of current (and popular)vocabularyof2018.Malayalamhasanactivepracticeofborrowingwordsfromotherlanguages(mainly fromEnglishnowadays)ratherthan inventingnativewords.Becauseof thisanythingthatisavalidconjunctcancomeintothelanguage.Hereisanexample:Youmayknow,Iamatypefacedesignertoo.WhensomeofourinitialfontsdidnothavetheOpenTyperulestohandle+ബ ,+ബു,itwasbecausenobodycouldfindawordthatcanhavesuchacombination.Later,around2010,Facebookbecameathing.PeoplestartedwritingitinMalayalam.Ourfontscouldnothandletherenderinggracefullyandthenweaddedtherequired ligaturesandrulesandreleasedanewversion.WhileIwasworkingonanothertypeface,anotherconjunct+മwasnotsupportedonthethinkingthatthereisnoMalayalamwordwithമ.Butlaterafriendcameandcomplainedhewantstohaveanerror-freerenderingforഅമീർ..Sothatisaboutthe 'reasoning of rare occurrence inMalayalam'. Btw, there are people and places withnameമാള&(Malalla) - tryagoogle search.Wepeople fromValluvanadareaoftenhas thisനലനീള&മുടി,നല/താള&പാ2്,െവ&&കിണ7...

    Agooglesearchforെവ&&showsmethatitisaplacenameinIdukki.

    About the visual similarity, again, as a type designer, we consciously make them visuallydifferentwhiledesigning.+ള ->appearvery joinedwiththetails fusedtogether,Whileളളappearwithenoughspacingbetweenthelettersandnofusingoftails.

    Also,ററis a similar casewhere peoplewrite twoRa together to get /tta/ , Almost all fontsnowadaysstackthemifitisfor/tta/.Butnotguaranteed.Sosimilarargumentscanbethereforthataswell.

    Misspellinglikeമീററ7,ലാററൈററ7etc.comestomymind.

    Inallthesecases,exclusionruleswouldbetheleastpreferredchoice.

    രണ്ട് ള അടുപ്പിച്ചു വരുമ്പോൾ അത് ള്ള യുടെ വേരിയന്റായി

    കണക്കാക്കാമെന്നായിരുന്നു പറഞ്ഞിരുന്നത്. തിരിച്ചും.

    പക്ഷേ രണ്ട് ളകൾക്ക് ശേഷം ഒരു െ ചിഹ്നം വന്നാൽ അത് ളളെ എന്നാവും. അത്

    ള്ളയുമായി ഒരു തരത്തിലും സാദൃശ്യമില്ലാത്തതുമാണ്. ളളെ എന്ന സീക്വൻസിനെ

    ള്ളെ എന്നെ സ്വീക്വൻസിന്റെ വേരിയന്റായി കണക്കാക്കുന്നതായിരുന്നു

  • 37

    നേരെത്തെയുള്ള പ്രൊപ്പോസൽ. അത് അനാവശ്യമായ നിയന്ത്രണമാണെന്നാണ്

    കാണുന്നത്. അതിനാണ് പുതിയ ഒരു തിരുത്തൽ.

    പ്രധാനമായും ള്ള , രണ്ട് ളയുടെ വാരിയന്റാവണമെങ്കിൽ അതിനു ശേഷം െ ചിഹ്നം

    പാടില്ല, എന്ന ഒരു constraint കൂടി വെച്ച് ളളെ എന്ന സീക്വൻസ്

    പ്രശ്നമൊന്നുമില്ലാതെ ലേബലിൽ അനുവദിക്കാനാണ് പുതിയറൂളുകൾ

    വഴിയൊരുക്കുന്നത്. പ്രശ്നമൊന്നും കാണുന്നില്ല.

    െ യ്ക്കു പുറമേ, േ, ോ, ൊ, എന്നിവയ്ക്കും ഇതേ സ്വഭാവമുണ്ട് - reordering.

    ളയുടെ അതേ നിയമങ്ങൾ റ്റ യുടെ കേസിലും വരും.

    ള + ള്ര എന്ന ഒരു സീക്വൻസ് പക്ഷേ ഈ ഡോഖ്യുമെന്റിൽ പരമാർശിച്ചിട്ടില്ല.

    റീഓർഡറിങ്ങ് വരുന്ന ഒരു കേസാണത് - സ്വരചിഹ്നമല്ലാതെ. ള്ര = ള + ് + ര

    ള്ള്ര ളള്ര എന്ന ഒരു വാരിയന്റ് ഡെഫനിഷൻ എഫക്ടീവ് ആയി വരുന്നുണ്ട്

    ഇപ്പോൾ - പുതിയ പ്രൊപ്പോസലിലും. കാരണം R എന്ന സെറ്റിൽ റീ ഓർഡർ

    ചെയ്യുന്ന സ്വരചിഹ്നങ്ങൾ മാത്രമേ ഉള്ളൂ.

    ള്ള്ര ളള്ര visually similar അല്ലാത്തതുകൊണ്ട്

    സ്വരചിഹ്നങ്ങളെപ്പോലെത്തന്നെ അനാവശ്യമായ constraint ആവുന്നുണ്ട്. അതേ

    സമയം വളരെ വളരെ അപൂർവമാണ് ഈ സീക്വൻസ് എന്നത് വാസ്തവവുമാണ്.

    ട്രാൻസിലിറ്ററേഷനിൽ ചിലപ്പോൾ വന്നേക്കാം.

    അതുകൂടി R എന്ന സെറ്റിൽ ചേർക്കുന്നോ? അതായത് "Halant-followed-By-Ra" ?

    Translation:

    It was said that when two 0D33(ള) come in sequence (ളള), they may be considered as a

    variant of 0D33 Halant 0D33 (ള്ള) and vice versa. But the problem with this is that if a Matra

    comes after two 0D33s, it reorders in rendering as 0D33 Matra 0D33 ( for example, ളളെ )

    which is not visually similar to ളള. According to the previous proposal, the sequence ള്ളെ (

    0D33 Matra 0D33) was considered a variant of ളളെ ( 0D33 0D33 Matra). It is an

    unnecessary restriction and hence this correction.

  • 38

    First of all, in order to make 0D33 Halant 0D33 (ള്ള) a variant of two 0D33 in sequence (ളള),

    there shouldn't be any vowel sign (Matra) after 0D33 0D33. This constraint allows ളളെ in the

    label without any issues whatsoever.

    Same thing is applicable to other matras as well such as േ, ോ, ൊ.

    The same rule is applicable for റ (0D31) and റ്റ (0D31 Halant 0D31).

    Another similar case not mentioned in the document is the sequence ള + ള്ര = ളള്ര

    Reordering is applicable to this one as well even though it is not a Matra sign.

    ള്ര = ള + ് + ര (0D33 0D4D 0D30)

    ളള്ര is 0D33 0D33 0D4D 0D30

    This makes a ള്ള്ര ളള്ര definition effective because in the new proposal R set only

    contains the re-ordering vowel signs (Matra). But ള്ള്ര ളള്ര aren't visually similar and

    hence an unnecessary constraint just like the vowel signs. On the other hand, this sequence

    is very rare and found in transliteration from time to time. Should this be added to the R set

    as well, that is Halant followed by Ra (0D4D 0D30)?

  • 39

    13. AppendixD:NBGPCross-scriptVariantInclusionPolicyIf, in any two given scripts, all the potential cross-script variants consist of dependent (e.g.VowelSigns,Anusvara,Visarga,Chandrabinduetc.)charactersONLY,thenthatentiresetcanbeignoredandnocross-scriptvariantsbeproposedbetweenthosetwoscripts.

    If,inanytwogivenscripts,thereisATLEASTONEnon-dependent(e.g.Consonant,Voweletc.)cross-script variant character/sequence present, all the potential cross-script variants beconsideredandproposedbetweenthetwoscripts.

    This cross-script analysis has been restricted to the scripts that have descended from theBrahmiasmostof themsharesimilarusagepatterns.Byand large,allof thesescriptshaveacommonsetofcharactersthatexistedinBrahmiscriptandbearthesameidentities.However,as the scriptsbranchedout from theBrahmi,dependingonvarious factors, the shapesof thecharacterschanged.Thischangeintheshapewasnotuniformacrossallthecharactersandthescripts. Some characters shapes did change significantlywhereas some of them still retainedsimilarity.Thecross-scriptsimilarityanalysisalsoaimstoidentifysuchcaseswherethesamecharacterretainedalmostthesameshapedespitebeingpartofthedifferentscripts.Thesesetofcharacters are variants of each other in the true sense, rather than merely by co-incidentalvisualsimilarity.

    Since, having such labels is a realistic possibility and the corresponding labels look almostexactlyalike,NBGPhasproposedthemasblockedvariants.

    NBGPacknowledgestheconcernthatthisshapeisquitegenericandmayhaveparallelsinotherscriptsnotunderitsambit.However,asNBGPdoesnothaveanyexposureaboutactualusageof those characters in those particular scripts, NBGP desisted from including them in theanalysis.AsNBGPhasalreadyconsideredalltherelatedscriptsunderthecross-scriptvariantanalysis,thesimilarityofthecharactersbelongingtoNBGPscriptswithotherscriptsnotundertheNBGPambit,maybeofamereco-incidentalvisualnature.

    Additionally,thisconcernisnotlimitedtothesetwocharactersbutforallthecharactersinallthescriptsunderthescopeoftheRootLGRprocedure.CarryingoutthisanalysiscanpracticallybedoneonlywiththeGenerationPanelsthatexistwhiletheNBGPisactive.Thisstillleavesoutthosescriptsoutof thescopewhichmaynothaveaGenerationPanelestablishedyet.Hence,carryingoutthisexerciseinentiretyisquiteimpracticable.Thisconundrumcanberesolvedifallsuchcasesarehandledbythe"StringSimilarityAssessmentPanel"ofICANN.