tdil mal tags

76
1 Copyright@TDIL Unified Parts of Speech (POS) Standard in Indian Languages - Draft Standard –Version 1.0 Department of Information Technology Ministry of Communications & Information Technology Govt. of India

Upload: sijo-thomas

Post on 19-Jan-2016

81 views

Category:

Documents


1 download

DESCRIPTION

Malayalam pos tags used in tdil English malayalam statistical machine translation.

TRANSCRIPT

Page 1: Tdil Mal Tags

1

CopyrightTDIL

Unified Parts of Speech (POS) Standard in Indian Languages

- Draft Standard ndashVersion 10

Department of Information Technology Ministry of Communications amp Information Technology

Govt of India

2

CopyrightTDIL

CONTENTS

1 INTRODUCTION

2 SCOPE

3 TERMINOLOGY

31 POS Tag

32 XML Schema 33 Metadata

4 WHAT IS A POS TAG

5 REQUIREMENTS OF A POS TAG

51 Need of XML Schema in designing common POS format

6 POS TAG SET FOR INDIAN LANGUAGES

7 XML INTERNATIONALIZATION BEST PRACTICES

71 What is Internationalization Tag Set (ITS)

8 XML SCHEMA

9 METADATA ON POS

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA

11 POS SCHEMA BLOCK DIAGRAM

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES

14 ALGORITHM FOR SELECTION OF NODES

15 REFERENCE BASED IMPLEMENTATION

16 REFERENCE

ANNEXURES

A Language Code Table

3

CopyrightTDIL

1 INTRODUCTION

Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema

This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999

2 SCOPE

The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema

3 TERMINOLOGY

31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word

32 XML Schema XML Schemas express shared vocabularies and allow machines to

carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema

33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted

4

CopyrightTDIL

4 WHAT IS A POS TAG

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories

The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word

5 REQUIREMENT OF A POS TAG

The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity

51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT

The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking

applications bull XML documents can be stored without using database administrator because they

contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and

aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple

remote servers It is easier to convert data between different data types

5

CopyrightTDIL

6 POS Tag set for Indian Languages

POS Categories and Labels

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Verbal NNV N__NNV The verbal noun

sub type is only

for languages

such as Tamil and

Malayalam)

14 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

26 INDEFINITE PRI PR__PRI

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

34 Indefinite DMI DM__DMI

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal VN V__VN paTittam

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 2: Tdil Mal Tags

2

CopyrightTDIL

CONTENTS

1 INTRODUCTION

2 SCOPE

3 TERMINOLOGY

31 POS Tag

32 XML Schema 33 Metadata

4 WHAT IS A POS TAG

5 REQUIREMENTS OF A POS TAG

51 Need of XML Schema in designing common POS format

6 POS TAG SET FOR INDIAN LANGUAGES

7 XML INTERNATIONALIZATION BEST PRACTICES

71 What is Internationalization Tag Set (ITS)

8 XML SCHEMA

9 METADATA ON POS

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA

11 POS SCHEMA BLOCK DIAGRAM

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES

14 ALGORITHM FOR SELECTION OF NODES

15 REFERENCE BASED IMPLEMENTATION

16 REFERENCE

ANNEXURES

A Language Code Table

3

CopyrightTDIL

1 INTRODUCTION

Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema

This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999

2 SCOPE

The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema

3 TERMINOLOGY

31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word

32 XML Schema XML Schemas express shared vocabularies and allow machines to

carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema

33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted

4

CopyrightTDIL

4 WHAT IS A POS TAG

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories

The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word

5 REQUIREMENT OF A POS TAG

The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity

51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT

The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking

applications bull XML documents can be stored without using database administrator because they

contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and

aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple

remote servers It is easier to convert data between different data types

5

CopyrightTDIL

6 POS Tag set for Indian Languages

POS Categories and Labels

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Verbal NNV N__NNV The verbal noun

sub type is only

for languages

such as Tamil and

Malayalam)

14 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

26 INDEFINITE PRI PR__PRI

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

34 Indefinite DMI DM__DMI

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal VN V__VN paTittam

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 3: Tdil Mal Tags

3

CopyrightTDIL

1 INTRODUCTION

Parts of Speech tagging is one the key building blocks (noun pronoun verb demonstrative etc) for developing Natural Language Processing applications This POS schema is based on W3C XML Internalization best practices ISO 639-3 Language Codes for Language Identification ISO 126201999 as metadata definition and one to one mapping table for all the labels used in POS Schema

This document sets out the structural part of the XML Schema definition language and also how to make XML POS Schema for tagging XML Schemas including an introduction to the nature of XML Schemas and an introduction to the XML POS Schema abstract data model along with other terminology used throughout this document and also specifies the precise semantics of each component of the abstract model the representation of each component in XML This document contains block diagram that shows the flow-chart of creating XML scheme for POS tagging It also includes the algorithm that contains metadata as per ISO 126201999

2 SCOPE

The common unified XML based POS Schema for Indian Languages based on W3C Internationalization best practices have been formulated The schema has been developed to take into account the NLP requirements for Web based services in Indian Languages This standard specifies XML POS Schema for tagging This portion of the XML Schema Language discusses labels that can be used in an XML POS Schema

3 TERMINOLOGY

31 POS Tag A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word

32 XML Schema XML Schemas express shared vocabularies and allow machines to

carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema

33 Metadata Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted

4

CopyrightTDIL

4 WHAT IS A POS TAG

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories

The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word

5 REQUIREMENT OF A POS TAG

The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity

51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT

The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking

applications bull XML documents can be stored without using database administrator because they

contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and

aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple

remote servers It is easier to convert data between different data types

5

CopyrightTDIL

6 POS Tag set for Indian Languages

POS Categories and Labels

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Verbal NNV N__NNV The verbal noun

sub type is only

for languages

such as Tamil and

Malayalam)

14 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

26 INDEFINITE PRI PR__PRI

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

34 Indefinite DMI DM__DMI

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal VN V__VN paTittam

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 4: Tdil Mal Tags

4

CopyrightTDIL

4 WHAT IS A POS TAG

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word Parts of speech include nouns verbs adverbs adjectives pronouns conjunction and their sub-categories

The input to a tagging algorithm is a string of words of a natural language sentence and a specified tag set (a finite list of Part-of-speech tags) The output is a single best POS tag for each word

5 REQUIREMENT OF A POS TAG

The POS tagger can be used as a pre-processor Text indexing and retrieval uses POS information POS tagger is used for making tagged corpora and Machine Translation System Speech processing uses POS tags to decide the pronunciation POS tagger would be needed to identify the tag for the words that could not be analysed by the morphological analyser If the Morph gives multiple tags for a word then the tagger could be used to resolve the ambiguity

51 NEED OF XML SCHEMA IN DESIGNING COMMON POS FORMAT

The need of XML for creating POS tag-set is to standardize the POS tag framework for all Indian languages The main benefits of xml in using POS tag set for ILrsquos are bull It Supports multilingual documents and Unicode bull XML allows developers to add extra information to a format without breaking

applications bull XML documents can be stored without using database administrator because they

contain meta data in the form of tags and attributes bull The tree structure of XML documents allows documents to be compared and

aggregated efficiently element by element bull XML documents can consist of nested elements that are distributed over multiple

remote servers It is easier to convert data between different data types

5

CopyrightTDIL

6 POS Tag set for Indian Languages

POS Categories and Labels

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Verbal NNV N__NNV The verbal noun

sub type is only

for languages

such as Tamil and

Malayalam)

14 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

26 INDEFINITE PRI PR__PRI

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

34 Indefinite DMI DM__DMI

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal VN V__VN paTittam

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 5: Tdil Mal Tags

5

CopyrightTDIL

6 POS Tag set for Indian Languages

POS Categories and Labels

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Verbal NNV N__NNV The verbal noun

sub type is only

for languages

such as Tamil and

Malayalam)

14 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

26 INDEFINITE PRI PR__PRI

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

34 Indefinite DMI DM__DMI

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal VN V__VN paTittam

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 6: Tdil Mal Tags

6

CopyrightTDIL

naTattam naTanam

42 Auxiliary VAUX V__VAUX

421 Finite VAUX V__VAUX__VF

422 Non-finite VNF V__VAUX__VNF

423 Infinitive VINF V__VAUX__VINF

424 Gerund VNG V__VAUX__VNG

425 PARTICIP

LE NOUN

VNP V_VAUX_VNP

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 7: Tdil Mal Tags

7

CopyrightTDIL

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

POS for Hindi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ladakaa raajaa kitaaba

11 Common NN N__NN kitaaba kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST Uupara

niice aage

piiche

2 Pronoun PR PR Yaha vaha

jo

21 Personal PRP PR__PRP Vaha main

tuma ve

22 Reflexive PRF PR__PRF Apanaa

swayam

khuda

23 Relative PRL PR__PRL Jo jis jab

jahaaM

24 Reciprocal PRC PR__PRC Paraspara

aapasa

25 Wh-word PRQ PR__PRQ Kauna kab

kahaaM

Indefinite PRI PR__PRI Koii kis

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 8: Tdil Mal Tags

8

CopyrightTDIL

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD Vaha yaha

32 Relative DMR DM__DMR jo jis

33 Wh-word DMQ DM__DMQ kis kaun

Indefinite DMI DM__DMI KoI kis

4 Verb V V giraa gayaa

sonaa

haMstaa

hai rahaa

41 Main VM V__VM giraa gayaa

sonaa

haMstaa

42 Auxiliary VAUX V__VAUX hai rahaa

huaa

5 Adjective JJ JJ sundara

acchaa

baRaa

6 Adverb RB RB jaldii teza

7 Postposition PSP PSP ne ko se

mein

8 Conjunction CC CC aur agar

tathaa

kyonki

81 Co-ordinator CCD CC__CCD aur balki

parantu

82 Subordinator CCS CC__CCS Agar

kyonki to

ki

9 Particles RP RP to bhii hii

91 Default RPD RP__RPD tobhii hii

93 Interjection INJ RP__INJ are he o

94 Intensifier INTF RP__INTF bahuta

behada

95 Negation NEG RP__NEG nahiin

mata binaa

10 Quantifiers QT QT thoRaa

bahuta

kucha eka

pahalaa

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 9: Tdil Mal Tags

9

CopyrightTDIL

101 General QTF QT__QTF thoRaa

bahuta

kucha

102 Cardinals QTC QT__QTC eka do

tiina

103 Ordinals QTO QT__QTO pahalaa

duusaraa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Punjabi

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

11 Common NN N__NN ਘਰ ਿਕਤਾਬ

ਕਹਾਣੀ ਸਡਕ

Gara kiwAba kahANI sadZaka

12 Proper NNP N__NNP ਹਰਿਵਦਰ haraviMxara

xiYlI

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 10: Tdil Mal Tags

10

CopyrightTDIL

ਿਦਲੀ

ਤਾਜਮਿਹਲ

wAjamahila

14 Nloc NST N__NST ਤ ਥਲ ਅਗ

ਿਪਛ

uYwe WaYle

aYge piYCe

2 Pronoun PR PR ਮ ਤ ਉਹ ਇਹ

mEz wUM uha

iha jo

21 Personal PRP PR__PRP ਮ ਤ ਉਹ mEz wuM uha

22 Reflexive PRF PR__PRF ਆਪਣਾ ਆਪ

ਖਦ

ApaNA Apa

Kuxa

23 Relative PRL PR__PRL ਜ ਿਜਸ

ਿਜਹਡਾ ਜਦ

jo jisa jihadZA

jaxoz

24 Reciprocal PRC PR__PRC ਆਪਸ Apasa

25 Wh-word PRQ PR__PRQ ਕਣ ਕਦ ਿਕਥ kONa kaxoz

kiYWe

26 Indefinite PRI PR_PRI ਕਈ ਿਕਸ koI kisa

3 Demonstrative DM DM ਉਹ ਜ ਇਹ uha jo iha

31 Deictic DMD DM__DMD ਇਹ ਉਹ iha uha

32 Relative DMR DM__DMR ਜ ਿਜਸ jo jisa

33 Wh-word DMQ DM__DMQ ਕਣ kONa

34 indefinite DMI DM_DMI ਕਈ ਿਕਸ koI kisa

4 Verb V V ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

41 Main VM V__VM ਆਇਆ ਜਾ

ਕਰਦਾ

ਮਾਰਗਾ

ਰਿਹਦਾ

AiA jA karaxA

mArAzgA

rahiMxA

412 Non-finite VNF V__VM__VNF ਜਿਦਆ

ਆਿਦਆ

jAzxiAz

AuzxiAz

karaxiAz

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 11: Tdil Mal Tags

11

CopyrightTDIL

ਕਰਿਦਆ ਖਾਕ

ਜਾਕ

KAke jAke

413 Infinitive VINF V__VM__VINF ਿਗਆ

ਆਇਆ

ਕਿਰਆ

giAz AiAz

kariAz

414 Gerund VNG V__VM__VNG ਜਾਣ ਖਾਣ ਪੀਣ

ਮਰਨ

jANoz KANoz

pINoz

maranoz

42 Auxiliary VAUX V__VAUX ਹ ਸੀ ਸਿਕਆ

ਹਇਆ

hE sI sakiA

hoiA

5 Adjective JJ ਸਹਣਾ ਚਗਾ

ਮਾਡਾ ਕਾਾਾ

sohaNA

caMgA

mAdZA kAA

6 Adverb RB ਹਾੀ ਕਾਹਲੀ hOI kAhalI

7 Postposition PSP ਨ ਨ ਤ ਨਾਲ ne nUM woz

nAla

8 Conjunction CC CC ਅਤ ਿਕਿਕ

ਅਗਰ ਿਕ ਸਗ

awe kiuzki

agara ki sagoz

81 Co-ordinator CCD CC__CCD ਅਤ ਜ awe jAz

82 Subordinator CCS CC__CCS ਿਕਿਕ ਿਕ ਜ

kiuzki ki jo

wAz

9 Particles RP RP ਵੀ ਤ ਹੀ vI wAz hI

91 Default RPD RP__RPD ਵੀ ਤ ਹੀ vI wAz hI

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ਉਏ ਅਿਡਆ

ਨੀ ਜਨਾਬ

ue adZiA nI

janAba

94 Intensifier INTF RP__INTF ਬਹਤ ਬਡਾ bahuwa

badZA

95 Negation NEG RP__NEG ਨਹ ਨਾ ਿਬਨ

ਵਗਰ

nahIz nA

binAz vagEra

10 Quantifiers QT QT ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ ਇਕ

WodZA

bahuwA kAPI

kuJa iYka

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 12: Tdil Mal Tags

12

CopyrightTDIL

ਪਿਹਲਾ pahilA

101 General QTF QT__QTF ਥਡਾ ਬਹਤਾ

ਕਾਫੀ ਕਝ

WodZA

bahuwA kAPI

kuJa

102 Cardinals QTC QT__QTC ਇਕ ਦ ਿਤਨ iYka xo wiMna

103 Ordinals QTO QT__QTO ਪਿਹਲਾ ਦਜਾ pahilA xUjA

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH (ਪਾਣੀ-) ਧਾਣੀ

(ਚਾਹ-) ਚਹ

(pANI-) XANI

(cAha-) cUha

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

Tagset for Dravidian Languages (Telugu Kannada Malayalam and Tamil)

Sl No Category Label Annotation

Convention

Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN

12 Proper NNP N__NNP

13 Nloc NST N__NST

2 Pronoun PR PR

21 Personal PRP PR__PRP

22 Reflexive PRF PR__PRF

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 13: Tdil Mal Tags

13

CopyrightTDIL

23 Relative PRL PR__PRL

24 Reciprocal PRC PR__PRC

25 Wh-word PRQ PR__PRQ

3 Demonstrative DM DM

31 Deictic DMD DM__DMD

32 Relative DMR DM__DMR

33 Wh-word DMQ DM__DMQ

4 Verb V V

41 Main VM V__VM

411 Finite VF V__VM__VF

412 Non-finite VNF V__VM__VNF

413 Infinitive VINF V__VM__VINF

414 Gerund VNG V__VM__VNG

42 Verbal Noun Verbal noun NNV N_NNV Verbal Noun

43 Auxiliary VAUX V__VAUX

431 Non-finite VNF V_VM_VNF

432 Infinite VINF V_VM_VNF

5 Adjective JJ

6 Adverb RB Only manner

adverbs

7 Postposition PSP

8 Conjunction CC CC

81 Co-

ordinator

CCD CC__CCD

82 Subordinator CCS CC__CCS

821 Quotative UT CC__CCS__UT

9 Particles RP RP

91 Default RPD RP__RPD

92 Classifier CL RP__CL

93 Interjection INJ RP__INJ

94 Intensifier INTF RP__INTF

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 14: Tdil Mal Tags

14

CopyrightTDIL

95 Negation NEG RP__NEG

10 Quantifiers QT QT

101 General QTF QT__QTF

102 Cardinals QTC QT__QTC

103 Ordinals QTO QT__QTO

11 Residuals RD RD

111 Foreign

word

RDF RD__RDF A word written in

script other than

the script of the

original text

112 Symbol SYM RD__SYM For symbols such

as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Tamil

Sl No Category Label Annotation Convention

Examples Remarks

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N paiyan

raajaa

puttakam

11 Common NN N__NN puttakam

kaNNaaTi

paTam

12 Proper NNP N__NNP moohan ravi maalati

13 Nloc NST N__NST meel kiiz mun pin

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 15: Tdil Mal Tags

15

CopyrightTDIL

2 Pronoun PR PR ituatuavan

21 Personal PRP PR__PRP naan nii avaL avarkaL

22 Reflexive PRF PR__PRF taan

23 Relative PRL PR__PRL yaar etu eppootu enkee

24 Reciprocal PRC PR__PRC oruvarukoruvar avanavan parasparam

25 Wh-word PRQ PR__PRQ yaarum yaaraavatu yaaroo etuvum

3 Demonstrative DM DM a- i- e-

31 Deictic DMD DM__DMD anta inta enta

32 Relative DMR DM__DMR enta

33 Wh-word DMQ DM__DMQ enta yaar eetaavatu yaaraavatu

4 Verb V V vizu poo tuunku aaku

41 Main VM V__VM vizu poo tuunku ciri

411 Finite VF V__VM__VF vizuntaan pooneen cirittaaL

412 Non-finite VNF V__VM__VNF vizunta poonaal

413 Infinitive VINF V__VM__VINF viza pooka cirikka

414 Gerund VNG V__VM__VNG vizutal cirittal tuunkutal

42 Verbal VN V_VN paTippu naTai naTattai ceykai

43 Auxiliary VAUX V__VAUX aakum veeNTum muTiyum

5 Adjective JJ iniya periya azakaana

6 Adverb RB veekamaaka viraivaaka

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 16: Tdil Mal Tags

16

CopyrightTDIL

7 Postposition PSP paRRi kuRittu viTa

8 Conjunction CC CC maRRum eenenRaal aanaal

81 Co-ordinator CCD CC__CCD -um(raamanum) maRRum aanaal allatu

-um is a co-ordinator which can be added to noun and verb

82 Subordinator CCS CC__CCS enRu ena enpatu enRaal

821 Quotative UT CC__CCS__UT enRu ena

9 Particles RP RP maTTUm kuuTa

91 Default RPD RP__RPD maTTUm kuuTa

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ ayyoo teey aamaam

94 Intensifier INTF RP__INTF ati veku mika

95 Negation NEG RP__NEG illai

10 Quantifiers QT QT koncam niRaiya oru mutal

101 General QTF QT__QTF koncam niRaiya

102 Cardinals QTC QT__QTC onRu iraNTu

103 Ordinals QTO QT__QTO mutal iraNTaam

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written in script other than the script of the original text

112 Symbol SYM RD__SYM $ amp ( ) ruu

For symbols such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH vaNTi kiNTi paal kiil

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 17: Tdil Mal Tags

17

CopyrightTDIL

POS for Malyalam

Sl No

Category Label Annotation Convention

Examples Examples in Malayalam

Top level Subtype (level 1)

Subtype (level 2)

1 Noun N N avan

mOhan

vItu

11 Common NN N__NN vItu

vellam

pattam

12 Proper NNP N__NNP mOhan ravi sIta

േമാഹ൯ രവി സീത

13 Nloc NST N__NST mEle tAze munpil pinnil

േമെല താെഴ മനിി ിനിി

2 Pronoun PR PR avanavalatuitu

അവ൯ അവള അത ഇത

21 Personal PRP PR__PRP naan nii avaL avar

ഞാ൯നീ അവള അവ൪

22 Reflexive PRF PR__PRF tanne-taan തെനതാ൯

23 Relative PRL PR__PRL aaro ആേരാ 24 Reciprocal PRC PR__PRC tammiltammi

l parasparam

തമിിിതമിി

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 18: Tdil Mal Tags

18

CopyrightTDIL

രസരം

25 Wh-word PRQ PR__PRQ aaru evan ആര എവ൯

3 Demonstrative DM DM aa- ii- ആ ഈ 31 Deictic DMD DM__DMD atu itu അത

ഇത 32 Relative DMR DM__DMR eetu ഏത 33 Wh-word DMQ DM__DMQ eetu ennane ഏത

എങെന 4 Verb V V pO kazhi

Annuciri ോ കഴി ആണി(Cop

ula) ചിരി 41 Main VM V__VM pO kazhi

cirriAnnu(copula)

ോ കഴി ആണി (copula) ചിരി

411 Finite VF V__VM__VF pOyi cirikkum kazhikkunnu Akunnu(copula)

ോയി ചിരികം കഴികന ആകന(copula)

412 Non-finite VNF V__VM__VNF pOya ciricca kazhicca

ോയ ചിരിച കഴിച

413 Infinitive VINF V__VM__VINF pOkku cirikkukayAl kazhikkee varAnvaruvAn

ോക ചിരിക കയാി

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 19: Tdil Mal Tags

19

CopyrightTDIL

കഴിക വരാ൯വരവാ൯

42 Verbal VN V__VN paTittam naTattam naTanam

ഠിതം നടതം നടനം

43 Auxiliary VAUX V_VAUX kolluka talluka kAnuka nOkkuka

െകാലക തലക കാണക േനാകക

5 Adjective JJ valiya ceRiya azakulla

വലിയ െചറിയ അഴകള

6 Adverb RB veegam ativeegam kUtutal

േവഗം അതിേവഗം കടതി

7 Postposition PSP paRRi kUte റി കെട

8 Conjunction CC CC pakshe enniTTum ennAlennalum enkilum

െക എനിനം എനാി എനാ

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 20: Tdil Mal Tags

20

CopyrightTDIL

ലം എങിലം

81 Co-ordinator CCD CC__CCD -um (rAmanum) pakshe

ഉംി(രാമനം) െക

82 Subordinator CCS CC__CCS ennu enna ennAl

എന എന എനാി

821 Quotative UT CC__CCS__UT ennu enna എന എന

9 Particles RP RP kutemAtram കെട മാതം

91 Default RPD RP__RPD mAtram മാതം 92 Classifier C RP__CL peer േ൪ 93 Interjection INJ RP__INJ ayyoo അേയാ 94 Intensifier INTF RP__INTF pala valare ല

വളെര 95 Negation NEG RP__NEG illa alla ഇല

അല 10 Quantifiers QT QT kuracchu

niraccu oru dharalam

കറച നിറച ഒര ധാരാളം

101 General QTF QT__QTF kuraccu niraccu dharalam

കറച നിറച ധാരാളം

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 21: Tdil Mal Tags

21

CopyrightTDIL

102 Cardinals QTC QT__QTC onnurantu ഒന രണ

103 Ordinals QTO QT__QTO onnAmrantam

ഒനാം രണാം

11 Residuals RD RD 111 Foreign word RDF RD__RDF 112 Symbol SYM RD__SYM $ amp ( )

ruu $ amp ( ) ര

113 Punctuation PUNC RD__PUNC 114 Unknown UNK RD__UNK 115 Echowords ECH RD__ECH

POS for Bangla

Sl No Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalama cashmaa

12 Proper NNP N__NNP Mohan ravi

rashmi

14 Nloc NST N__NST upare

niche

bhitara

2 Pronoun PR PR

21 Personal PRP PR__PRP se tumi

AmAra

22 Reflexive PRF PR__PRF nijera

23 Relative PRL PR__PRL ye yakhana

yena yAra

24 Reciprocal PRC PR__PRC paraspara

25 Wh-word PRQ PR__PRQ ke kakhana

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 22: Tdil Mal Tags

22

CopyrightTDIL

kena kAra

26 Indefinite PRI PR__PRI keu

3 Demonstrative DM DM Vaha jo

yaha

31 Deictic DMD DM__DMD sei oi o se

32 Relative DMR DM__DMR ye yei

33 Wh-word DMQ DM__DMQ kono

34 Indefinite DMI DM__DMI keu

4 Verb V V

41 Main VM V__VM

41

1

Finite VF V__VM__VF karachhilAm

a yAba

khAYa

41

2

Non-finite VNF V__VM__VNF kare

kheYe

karale

khete

41

3

Infinitive VINF V__VM__VINF karate

khete yete

41

4

Gerund VNG V__VM__VNG yAoYa

AsA khelA

karA

42 Auxiliary VAUX V__VAUX chhila

habe chAi

5 Adjective JJ sundara

bhAla lAla

6 Adverb RB tADAtADi

Aste

haThAt

7 Postposition PSP theke

abadhI

madhye

diYe

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD Ara eban

athabA

kimbA

82 Subordinator CCS CC__CCS ye kintu

noile

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 23: Tdil Mal Tags

23

CopyrightTDIL

tAhale

82

1

Quotative UT CC__CCS__UT ---- Not required

9 Particles RP RP

91 Default RPD RP__RPD to ye

92 Classifier CL RP__CL jana khAnA

93 Interjection INJ RP__INJ Are ei

hAya

94 Intensifier INTF RP__INTF bhiShaNa

khuba

sA~NghAtik

a

95 Negation NEG RP__NEG nA naYa

chhADA

10 Quantifiers QT QT

101 General QTF QT__QTF kichhu

alpa aneka

102 Cardinals QTC QT__QTC eka dui

tina

103 Ordinals QTO QT__QTO prathama

paYalA

dvitIYa

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word written

in script other

than the script

of the original

text

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH jala Tala

khAbAra

dAbAra

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 24: Tdil Mal Tags

24

CopyrightTDIL

POS for Marathi

Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N मलगा (mulagaa-boy)

राजा (raajaa-king)

पसत (pustaka-book)

11 Common NN N__NN पसत (pustaka-book) लखणी (lekhaNi-pen) चषमा (chashmaa-goggles )

12 Proper NNP N__NNP मोहन (Mohan) रवी (Ravi) रशमी (Rashmi)

13 Verbal NNV N__NNV NA Not

Required

14 Nloc NST N__NST वर(var- up)

खाल(khaalee-

down)

पढ(pudhe-

ahead)

माग(maage-

back)

Where it is

separate it is

NST

2 Pronoun PR PR यथ(yethe-

here) थ (tethe-there)

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 25: Tdil Mal Tags

25

CopyrightTDIL

जो(jo-who)

ो(to-he)

21 Personal PRP PR__PRP ो(to-he)

मी(mee-I)

(tu-you)

(te-they)

मह(tumhi-

you)

22 Reflexive PRF PR__PRF सवत(swatha-

myself)

आपण(aapana-

oursleves)

23 Relative PRL PR__PRL जो(jo-who)

जयान(jyaane-

who)

जवहा(jevhaa-

while)

िजथ(jeethe-

where)

24 Reciprocal PRC PR__PRC परसपर(Parasp

ara-

reciprocally )

एतमत(ekmek

- mutually)

25 Wh-word PRQ PR__PRQ तोण(kona-

who)

तवहा(kevha-

when)

तठ(kuthe-

where)

26 Indefinite तोणी(kona

3 Demonstrative DM DM ो(to-he)

हा(haa-this)

जो(jo-who)

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 26: Tdil Mal Tags

26

CopyrightTDIL

31 Deictic DMD DM__DMD इथ(ithe-here)

थ(tithe-

there)

32 Relative DMR DM__DMR जो(jo-who)

जयान(jyane-

who)

33 Wh-word DMQ DM__DMQ तोणा(konta-

which)

तोणी(kona-

who)

4 Verb V V (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41 Main VM V__VM पडला (padalaa-fell

down)

गला(gelaa-

went)

झोपला(jhopala

a-slept)

आह(aahe-is)

41

1

Finite VF V__VM__VF - This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information

at the word

level

41

2

Non-finite VNF V__VM__VNF - --do--

41

3

Infinitive VINF V__VM__VINF - --do--

41 Gerund VNG V__VM__VNG --do--

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 27: Tdil Mal Tags

27

CopyrightTDIL

4

42 Auxiliary VAUX V__VAUX आह (is) लागला (started)

5 Adjective JJ सदर(sundara-

beautiful)

चागला(chaang

alaa-good)

मोठा(moThaa-

big)

6 Adverb RB लवतर(lavakar

- fast )

हळहळ(haLuuh

aLuu-slowly)

7 Postposition PSP Not in Marathi

8 Conjunction CC CC आण(aaNi-

and)

तारण(kaaraN-

because)

81 Co-ordinator CCD CC__CCD आण(aaNi-

and)

पण(paNa-

but) पर (parantu-but)

82 Subordinator CCS CC__CCS तारण त (kaaraN-

because of)

ता त(kaaraN

kii-because

of) जर-र(jara-tara-

if-then)

82

1

Quotative UT CC__CCS__UT असा महणन

9 Particles RP RP र(tara)

91 Default RPD RP__RPD र(tara) (then)

92 Classifier CL RP__CL Not required

93 Interjection INJ RP__INJ अरर(arere)

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 28: Tdil Mal Tags

28

CopyrightTDIL

ओहो(oho-

oh)

94 Intensifier INTF RP__INTF खप(khoop-

lot very )

बराच(baraach-

too much)

अशय(atisha

ya- too much

very)

95 Negation NEG RP__NEG नतो(nako-

not) न(na-

Na)

10 Quantifiers QT QT थोड(thode-

few)

जास(jaasta-

lot)

ताह(kaahi-

few) एत(eka-

one)

पहला(pahilaa-

first)

101 General QTF QT__QTF थोड thoDe-

few)

जास(jaasta-

lot)

ताह(kaahi-

few)

102 Cardinals QTC QT__QTC एत(eka-one)

दोन(dona-two)

103 Ordinals QTO QT__QTO पहला(pahilaa-

first)

दसरा(dusaraa-

second)

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 29: Tdil Mal Tags

29

CopyrightTDIL

112 Symbol SYM RD__SYM $ amp ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जवणबवण(jev

anbivaNa-

mealdinner)

डोतबत(Doke

bike- head)

(Paanii-)

vaanii

(khaanaa-)

vaanaa

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically POS for Gujarati Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN kalamchashmA

lsquopenrsquo lsquospectaclesrsquo

12 Proper NNP N__NNP mohanravI

lsquoMohanrsquo lsquoRavirsquo

13 Nloc NST N__NST upar nIche ahIM

lsquouprsquo lsquodownrsquo lsquoin frontrsquo

2 Pronoun PR PR

21 Personal PRP PR__PRP huMtuMte

lsquomersquo lsquoyoursquo

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 30: Tdil Mal Tags

30

CopyrightTDIL

lsquoheshersquo 22 Reflexive PRF PR__PRF pote

jAtesvayam

lsquoherselfhimselfrsquo

23 Relative PRL PR__PRL je te jyAM

lsquowhorsquo lsquowherersquo

24 Reciprocal PRC PR__PRC aras-paras paraspar

lsquomutuallyrsquolsquoeach otherrsquo

25 Wh-word PRQ PR__PRQ koN kyAre kyAM

lsquowhorsquo lsquowhenrsquo lsquowherersquo

26 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

3 Demonstrative DM DM

31 Deictic DMD DM__DMD A

lsquothisrsquo

32 Relative DMR DM__DMR je jeNe

lsquowhichwhorsquo lsquowhomrsquo

33 Wh-word DMQ DM__DMQ koNshuMkem

lsquowhorsquo lsquowhatrsquo lsquowhyrsquo

34 Indefinite koI kaIMK kashuM

lsquosomeonersquo lsquosomethingrsquo

4 Verb V V

41 Main VM V__VM khAshekhAdhu

lsquowill eatrsquo

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 31: Tdil Mal Tags

31

CopyrightTDIL

lsquoatersquo 42 Auxiliary VAUX V__VAUX chhehatuMk

aryuM

lsquoisrsquo rsquowasrsquo lsquodidrsquo

5 Adjective JJ

6 Adverb RB

7 Postposition PSP

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD aneke

lsquoandrsquo lsquoorrsquo

82 Subordinator CCS CC__CCS tethI evuM kAraNke

lsquosorsquo lsquolike thatrsquo lsquobecausersquo

9 Particles RP RP

91 Default RPD RP__RPD paNajatO

lsquobutrsquo emph topic

92 Interjection INJ RP__INJ hE arrrE O

93 Intensifier INTF RP__INTF bahughaNuM

lsquoveryrsquo lsquomuchrsquo

94 Negation NEG RP__NEG nahina

lsquonorsquo

10 Quantifiers QT QT

101 General QTF QT__QTF thoduMghaNuM

lsquolittlersquo lsquomuchrsquo

102 Cardinals QTC QT__QTC ekabe traN

lsquoonetwothreersquo

103 Ordinals QTO QT__QTO paheluMbIjI

lsquofirstrsquo(neu)

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 32: Tdil Mal Tags

32

CopyrightTDIL

lsquosecondrsquo (fem)

11 Residuals RD RD

111 Foreign word RDF RD__RDF tv perasitemol

112 Symbol SYM RD__SYM $ amp

113 Punctuation PUNC RD__PUNC ()

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH kAm-bAmpANi-bANi

lsquowork and the likersquo water and the likersquo

POS for Konakani Sl

No Category Label Annotation

Convention Examples Remark

s

Top level Subtype

(level 1) Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पसत रख आबो

माड

12 Proper NNP N__NNP रामायण बायबल तराण गय ततणी तपला

13 Nloc NST N__NST भायर भीर वयर सतयल

2 Pronoun PR PR

21 Personal PRP PR__PRP हाव ो तयो मच आमच ाच

22 Reflexive PRF PR__PRF आपण सवा

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 33: Tdil Mal Tags

33

CopyrightTDIL

23 Relative PRL PR__PRL जा जो

24 Reciprocal PRC PR__PRC एतामतात आपसा

25 Wh-word PRQ PR__PRQ तोण त खयचो

26 Indefinite तोणय त य खयचय

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ो हो

32 Relative DMR DM__DMR जो

33 Wh-word DMQ DM__DMQ तोण तसल

34 Indefinite तोणाचय तसलय

4 Verb V V

41 Main VM V__VM यवप

411

Finite VF V__VM__VF आयलो आयला आयललो

412

Non-

Finite VNF V__VM__VNF यतच यवन

आयललयान यवत यवपात यवपाच यवच

413

Infinitive VINF V__VM__VINF आस वहर तलयार

414

Gerund VNG V__VM__VNG खावप वचप खावपी जवपी समजपी

42 Auxiliary VAUX V__VAUX NA

42

1 Finite V__VAUX__VF तलल आस आयला

आस

42

2 Non-

Finite V__VAUX__VN

F तरा जाय तरा आसलो यी

5 Adjective JJ सोबी सदर

6 Adverb RB फालया सवतास

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 34: Tdil Mal Tags

34

CopyrightTDIL

अश

7 Postposition PSP खाीर पास बगर तडन लागी

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आनी वा

82 Subordinator CCS CC__CCS जालयार जर-र दखन महणलयार पणन

82

1 Quotative UT CC__CCS__UT अश त

9 Particles RP RP

91 Default RPD RP__RPD बी आद इतयाद

92 Classifier CL RP__CL (पाच) जाण

93 Interjection INJ RP__INJ आर चप

94 Intensifier INTF RP__INTF उपाट भरपर

95 Negation NEG RP__NEG ना नयह

10 Quantifiers QT QT

101 General QTF QT__QTF थोड चड ताय खब

102 Cardinals QTC QT__QTC एत दोन

103 Ordinals QTO QT__QTO पयल दसर

11 Residuals RD RD

111 Foreign word RDF RD__RDF

112 Symbol SYM RD__SYM amp $

113 Punctuation PUNC RD__PUNC -

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जोवण-बवण

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 35: Tdil Mal Tags

35

CopyrightTDIL

POS for Maithili Sl

No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun N N

11 Common NN N__NN पोथी तलम

पड खवास

12 Proper NNP N__NNP अरण दनश

अल

13 Nloc NST N__NST आग पीछ

ऊपर नीचा एखन आब

बीच तह

2 Pronoun PR PR

21 Personal PRP PR__PRP हम ई ओ

अहा

22 Reflexive PRF PR__PRF अपना अपन

सवय सवयमव

23 Relative PRL PR__PRL ज िजनता िजनतर जतरा

24 Reciprocal PRC PR__PRC एत-दोसरत आपस परसपर

25 Wh-word PRQ PR__PRQ त त तथी ततर

Indefinite तओ तछ

तउछ तोनो

3 Demonstrative DM DM

31 Deictic DMD DM__DMD ओ ई ऊ

32 Relative DMR DM__DMR ज जाह

33 Wh-word DMQ DM__DMQ त त तोन

Indefinite तओ तछ

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 36: Tdil Mal Tags

36

CopyrightTDIL

तउछ तोनो

4 Verb V V

41 Main VM V__VM चलब रौप

पढइ खाइ

स हस

42 Auxiliary VAUX V__VAUX अछ छल

होएब थत

5 Adjective JJ नीत मोटता ललत

6 Adverb RB भन अनायास

कमश

एताएत

अवशय पनत फर

7 Postposition PSP स त लल

8 Conjunction CC CC

81 Co-ordinator CCD CC__CCD आओर परच

मदा वा

82 Subordinator CCS CC__CCS ज त यद

9 Particles RP RP

91 Default RPD RP__RPD भर यौ हौ रौ

Classifier CL RP_CL टा गोट गो

93 Interjection INJ RP__INJ ओह-ओ अहा वाह हा

94 Intensifier INTF RP__INTF बह बसी खब नान

95 Negation NEG RP__NEG न नह जन

10 Quantifiers QT QT

101 General QTF QT__QTF तनत बह

तछ

102 Cardinals QTC QT__QTC एत एतटा दई बीसगोट

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 37: Tdil Mal Tags

37

CopyrightTDIL

ीन चार

103 Ordinals QTO QT__QTO पहल दोसर सर चारम

11 Residuals RD RD

111 Foreign word RDF RD__RDF A word

written in

script other

than the

script of the

original text

112 Symbol SYM RD__SYM $ ( ) For symbols

such as $ amp

etc

113 Punctuation PUNC RD__PUNC Only for

punctuations

114 Unknown UNK RD__UNK

115 Echowords ECH RD__ECH जलख (लख)

मट (सट)

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

POS for Urdu Sl No

Category Label Annotation

Convention

Examples Remarks

Top level Subtype

(level 1)

Subtype

(level 2)

1 Noun

)ism-اسم(

N N لڑکا)laRkaa(

))raajaaراجا

)kitaab(کتاب

11 Common

-نکره(nakeraa(

NN N__NN کتاب)kitaab(

)qalam(قلم

)cashma(چشمہ

12 Proper

-معرفہ(

NNP N__NNP موہن))Mohan

رشمی

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 38: Tdil Mal Tags

38

CopyrightTDIL

mlsquoaarefa(( )Rashmi(

)Ravi(روی

13 Verbal

حاصل ( ndashمصدر

haasil-e-masdar(

NNV N__NNV جلن)jalan(

)calan(چلن

)bahaao(بہاؤ

بناوٹ )banaavat(

May be considered for Urdu- Hindi too

14 Nloc

) zarf-ظرف(

NST N__NST اوپر)upar(

)niice(نيچے

)aage(آگے

)piiche(پيچهے

2 Pronoun

)zamiir-ضمير(

PR PR يہ)yih(

)voh(وه

)jo(جو

21 Personal

ضمير (-شخصی

zamiir-e-shakhsii(

PRP PR__PRP وه)voh(

)tum(تم

)maim(ميں

In Urdu unlike Hindi voh is used both for singular and plural

22 Reflexive

ضمير )-معکوسیzamiir-e-

mlsquoaakoosii)

PRF PR__PRF اپنا)apnaa(

)khud(خود

اپنے آپ

)apne aap(

23 Relative

ضمير )-موصولہzamiir-e-mausoolaa(

PRL PR__PRL جو)jo(

)jab(جب )jis(جس

)jahaM(جہاں

24 Reciprocal

-ضمير راجع)zamiir-e-raajelsquo)

PRC PR__PRC باہم)baaham( درميان

)darmiyaan(

)aapas(آپس

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 39: Tdil Mal Tags

39

CopyrightTDIL

25 Wh-word

ضمير )-استفہاميہzamiir-e-istafhaamiyaa)

PRQ PR__PRQ کون)kaun(

)kab(کب

)kahaaM(کہاں

3 Demonstrative

-ضمير اشاره)zamiir-e-ishaaraa)

DM DM يہ)yih(

)voh(وه

)inn(ان

)unn(ان

31 Deictic

-اشارے(ishaare(

DMD DM__DMD يہ)yih(

)voh(وه

32 Relative

ضمير اشاره )ہموصول -

zamiir-e-ishaaraa

mausoolaa)

DMR DM__DMR جو)jo(

) jis(جس

33 Wh-word

ضمير اشاره (-استفہاميہ

zamiir-e-ishaaraa

istafhaamiyaa(

DMQ DM__DMQ کون)kaun(

)kis(کس

)kitnaa(کتنا

According to Urdu grammar words like koi kisi kuch do not come under Wh-word they are used for indefinite person For them another category (subtype) ietankiir (indefinitive) is used Under this category

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 40: Tdil Mal Tags

40

CopyrightTDIL

following words are also placed chand

blsquoaaz fulaan sab bahut Can we have a category

subtype like indefinitive demonstrative (DMI)

4 Verb

)flsquoel-فعل(

V V گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

41 Main VM V__VM گرا)giraa(

)gayaa(گيا

)sonaa(سونا

)haMstaa(ہنستا

411 Finite

-محدود(mahdoo

d(

VF V__VM__VF This subtype

WILL NOT

be used for

Hindi as

Hindi does

not have

enough

information at

the word

level

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 41: Tdil Mal Tags

41

CopyrightTDIL

412 Nonfinite

غيرمحدو(air gh-د

mahdood(

VNF V__VM__VNF -- do--

413 Infinitive

-مصدر(masdar(

VINF V__VM__VINF -- do--

414 Gerund

حاصل (-مصدر

haasil-e- masdar(

VNG V__VM__VNG -- do--

42 Auxiliary

-فعل امدادی(flsquoel-e-imdaadi(

VAUX V__VAUX ہے)hai(

)rahaa(رہا

)huaa(ہوا

5 Adjective

)sifat-صفت(

JJ دلکش)dilkash( )safed(سفيد

)siyaah(سياه

)cauRaa(چوڑا

)uuMcaa(اونچا

6 Adverb

-متعلق فعل(mutlsquoalliq-e-

flsquoel(

RB تيز)tez(

jald((جلد

7 Postposition

-jaar-جارموخر(e-moakkhar(

PSP سے)se( نے )ne( کو )ko(

)meiM(ميں

8 Conjunction

)atflsquo-عطف(

CC CC اور)aur(

)agar(اگر

کيوں کہ )kyoMki(

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 42: Tdil Mal Tags

42

CopyrightTDIL

81 Co-ordinator

-حرف وصل(harf-e-vasl(

CCD CC__CCD اور)aur(

)voh(وه

)yaa(يا

)ki(کہ

)balki(بلکہ

82 Subordinator

-تابع کننده(taablsquoe

kunindaa(

CCS CC__CCS اگر)agar(

کيوں کہ )kyoMki(

)to(تو

821 Quotative

-اقتباسی(iqtabaas

ii(

UT CC__CCS__UT Not required

9 Particles

)haaliyaa-حاليہ(

RP RP تو)to(

)hii(ہی

)bhii(بهی

91 Default

-ڈيفالٹ)Default)

RPD RP__RPD تو)to(

)hii(ہی

)bhii(بهی

92 Classifier

-درجہ بند(darja band(

CL RP__CL Not required

93 Interjection

-فجائيہ(fajaarsquoiyaa(

INJ RP__INJ اے))e

)o(او

)are(ارے

)jii(جی

)ahaa(اہا

)vaah(واه

94 Intensifier INTF RP__INTF بہت)bahut(

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 43: Tdil Mal Tags

43

CopyrightTDIL

-حرف تاکيد(harf-e-taakiid(

)behad(بے حد

)albattaa(البتہ )zaroor(ضرور

خبردار )khabardaar(

95 Negation

-حرف نہی(harf-e-

nahii(

NEG RP__NEG نہ)na(

)nahiiM(نہيں

10 Quantifiers

-کميت نما(kamiiyat

numaa(

QT QT چند)cand(

متعدد

)mutarsquoaddad(

)qaliil(قليل

)kasiir(کثير

101 General

)aamlsquo -عام(

QTF QT__QTF تهوڑا)thoRaa(

)bahut(بہت )kuch(کچه

102 Cardinals

-اعداد مطلق(alsquoadaad -

e-mutlaq(

QTC QT__QTC ايک)Ek(

)do(دو

)tiin(تين

103 Ordinals

-ترتيبی اعداد(tartiibii

alsquoadaad(

QTO QT__QTO اول)avval(

)doam(دوم

)pahalaa(پہال دوسرا

)duusaraa(

11 Residuals

baaqi-باقی مانده(maandaa(

RD RD

111 Foreign RDF RD__RDF A word

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 44: Tdil Mal Tags

44

CopyrightTDIL

word

-بديسی لفظ(bidesii

lafz(

written in

script other

than the script

of the original

text

112 Symbol

-عالمت(lsquoalaamat(

SYM RD__SYM $ amp ( )

amp $

Such symbols are not used in Urdu They are written

(dollar) ڈالر (pound)پاونڈetc

113 Punctuation

-اوقاف(auqaaf(

PUNC RD__PUNC Only for

Punctuations

114 Unknown

naa-نامعلوم(mlsquoaaloom(

UNK RD__UNK

115 Echowords

گونج دار (-الفاظ

goonjdar lafz(

ECH RD__ECH )ول) -دل

)dil-) vil

ويار) -پيار(

)pyaar-) vyaar

وائے)-چائے(

)caalsquoe-) vaalsquoe

The annotation is to be done using the lowest level tag of the type hierarchy Once the lower level tag is selected the higher level tags should be stored automatically

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 45: Tdil Mal Tags

45

CopyrightTDIL

7 XML INTERNATIONALIZATION BEST PRACTICES

To make the common POS Schema for Indian Languages completely interoperable extensible and web enabled W3C XML Internationalization best practices guidelines and ISO Metadata standard are adopted in the above framework

71 WHAT IS INTERNATIONALIZATION TAG SET (ITS)

ITS is a technology to easily create XML which is internationalized and can be localized effectively

ITS for Schema developers

User will find proposals for attribute and element names to be included in their new schema (also called host vocabulary) It leads to easier recognition of the concepts represented by both schema users and processors [For more details httpwwww3orgTR2007REC-its-20070403]

Main Attributes

Defining mark-up for natural language labelling (xmllang- defined for the root element of your document and for any element where a change of language may occur) Defining mark-up to specify text direction (itsdir - defined for the root element of your document and for any element that has text content) Indicating which elements and attributes should be translated (itstranslateRule- elements to indicate which elements have non-translatable content) Providing information related to text segmentation (itswithinTextRule- elements to indicate which elements should be treated as either part of their parents or as a nested but independent run of text) Defining mark-up for unique identifiers (xmlid- elements with translatable content can be associated with a unique identifier) Defining mark-up for notes to localizers (itslocNote- allows content authors to provide localization-related notes as attribute values or to point to the location of the relevant note text using) [For more details httpwwww3orgTRxml-i18n-bp]

8 XML SCHEMA

XML Schemas express shared vocabularies and allow machines to carry out rules made by people and to define a class of XML documents and so the term instance document is often used to describe an XML document that conforms to a particular schema It provides a means for defining the structure content and semantics of XML documents [For more details httpwwww3orgTR1999NOTE-xml-schema-req-19990215]

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 46: Tdil Mal Tags

46

CopyrightTDIL

9 METADATA ON POS Metadata

Metadata describes how and when and by whom a particular set of data was collected and how the data is formatted It is essential for understanding information stored in data warehouses and has become increasingly important in XML-based Web applications

XML Metadata Metadata built into the document Every element has a tag to tell you where the data is stored in the document Descriptive tags give structure to the document and tell you what the data means (sort of) ldquoSort ofrdquo because it only tells the tag name so this only has meaning to someone who already understands what the element or attribute means

METADATA AS PER ISO 126201999

Metadata () ltxml version=10gt ltdatasm-categorySelection xmlns=httpwwwisocatorgnsdcif dcif-version=10gt ltglobalInformationgtltglobalInformationgt

ltlanguageSectiongt

ltlanguagegtenltlanguagegt

ltidentifiergt ltidentifiergt ltversiongt100ltversiongt ltregistrationStatusgtstandardltregistrationStatusgt registered as a standard ltorigingtISO 126201999

ltauthorgtltauthorgt

ltdomaingtltdomaingt

ltorigingt

ltcreationgt ltcreationDategt1999-01-01ltcreationDategt

ltcreationgt

ltdescriptionSectiongt

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 47: Tdil Mal Tags

47

CopyrightTDIL

ltdefinitionClassgt ltdefinition xmllang=engtltdefinitiongt ltsourcegtISO 126201999ltsourcegt

ltdefinitionClassgt

ltdescriptionSectiongt

ltlanguageSectiongt

10 ONE TO ONE MAPPING LABELS IN POS SCHEMA In order to develop common framework of XML based POS schema in all 22 Indian Languages it is necessary that labels defined in POS Schema for English to have one to one mapping for Indian Languages The XML schema needs to have a complete tree structure as depicted in fig below

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 48: Tdil Mal Tags

48

CopyrightTDIL

Start (Raw Corpora)

Declare Metadata

Declare POS Schema

Select Script (Devanagari

Malayalam Bangla Perso-arabic-----------

-- n=12

Select Language (Hindi Malayalam Bodo Kashmiri ----

---------n=22

Display (Metadata)

Call (POS Schema)

Display (Desired Nodes)

Hide (remaining nodes)

End

The common XML Schema would select a particular Indian Language by and the Schema then needs to be transformed into POS Schema for that particular language The language specific POS Schema could be enabled by making a particular branch of the tree structure lsquooffrsquo It is schematically represented in the next heading ie POS schema block diagram

11 POS SCHEMA BLOCK DIAGRAM

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 49: Tdil Mal Tags

49

CopyrightTDIL

12 DRAFT POS SCHEMA FOR INDIAN LANGUAGES USING XML

Pos schema ()

ltxml version=10 encoding=UTF-8gt

ltxsschema xmlnsxs=httpwwww3org2001XMLSchemagt

ltfile Descgt

lttitleStmtgt

lttitlegtPOS tag in multilingual languagelttitlegt

ltscriptgt ltscriptgt

ltlanguagegtmultilingualltlanguagegt

ltlabel languagegthelliphelliphelliphelliphellipltlabel languagegt

lttypegtmultimodallttypegt

[Languages taken Hindi Bodo Malayalam Kashmiri Assamese Konkani Gujarati]

--------------------------------------Noun Block--------------------------------------

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo mal-cat=rdquoനാമംrdquo

kas-cat=rdquo ناوت rdquo asm-cat=rdquoিবেশষযrdquo kok-cat=rdquoनामrdquo guj-catrdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo mal-cat=rdquoസാമാന നാമംrdquo kas-cat=rdquo عام rdquo asm-cat=rdquoজািতবাচকrdquo kok-

cat=rdquoजावाचत नामrdquo guj-catrdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo mal-cat=rdquoസംജാ നാമംrdquo kas-cat=rdquo خاص rdquo asm-cat=rdquoবযিিবাচকrdquo kok-

cat=rdquoवयवाचत नामrdquo guj-catrdquoવયતવચકrdquo tag=rdquoNNPgt

ltxsattribute name=type subcat =Verbalrdquo hin-cat=rdquoकयामलतrdquo brx-cat=rdquoहाबा

दिनथथाrdquo kas-cat=rdquo کراوتٲوۍ rdquo asm-cat=rdquoিয়াবাচকrdquo kok-cat=rdquoकयामळत नामrdquo guj-

catrdquoકવચકrdquo tag=rdquoNNVgt

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 50: Tdil Mal Tags

50

CopyrightTDIL

ltxsattribute name=type subcat =Nlocrdquo hin-cat=rdquoदश-ताल सापrdquo brx-cat=rdquoथावन

दिनथथा ममाrdquo mal-cat=rdquoആധാരിക നാമംrdquo kas-cat=rdquo ناوتہ جايہ ہاو rdquo asm-cat=rdquoানবাচকrdquo

kok-cat=rdquoथळ -ताळ-साप नामrdquo guj-catrdquoસાવચકrdquo tag=rdquoNSTgt

-------------------------------------Pronoun Block-----------------------------------

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo mal-

cat=rdquoസര വനാമംrdquo kas-cat=rdquo پرناوت rdquo asm-cat=rdquoসবরনাাrdquo kok-cat=rdquoसवरनामrdquo guj-

catrdquoસવરાાrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo mal-cat=rdquoരഷ സര വനാമംrdquo kas-cat=rdquo شخصيٲتی rdquo asm-cat=rdquoবযিিবাচকrdquo

kok-cat=rdquoपरश सवरनामrdquo guj-catrdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo mal-cat=rdquoനിചവാചി സര വനാമംrdquo kas-cat=rdquo ماکوسی rdquo asm-cat=rdquoআতবাচকrdquo

kok-cat=rdquoआतमवाचत सवरनामrdquo guj-catrdquoપિતિતિતતrdquo tag=rdquoPRFgt

ltxsattribute name=type subcat =Reciprocalrdquo hin-cat=rdquoपारसपरतrdquo brx-

cat=rdquoगावज गाव सोमोनदोrdquo mal-cat=rdquoസംബനവാചി സര വനാമംrdquo kas-cat=rdquo باہمی rdquo

asm-cat=rdquoপাৰিৰকrdquo kok-cat=rdquoसबद सवरनामrdquo guj-catrdquoપરસપરવચચrdquo tag=rdquoPRCgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoാരസിക സര വനാമംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo asm-

cat=rdquoসবাচকrdquo kok-cat=rdquoएतमत सवरनामrdquo guj-catrdquoસપકrdquo tag=rdquoPRLgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoसथ

दिनथथाrdquo mal-cat=rdquoേചാദവാചി സര വനാമംrdquo kas-cat=rdquo ک لفظ rdquo asm-cat=rdquoেবাধক

সবরনাাrdquo kok-cat=rdquoपसनाथन सवरनामrdquo guj-catrdquoપ રવચકrdquo tag=rdquoPRQgt

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 51: Tdil Mal Tags

51

CopyrightTDIL

----------------------------------Demonstrative Block------------------------------

ltxselement name=cat POS cat=rdquoDemonstrativerdquo hin-cat=rdquoनषचयवाचतrdquo brx-cat=rdquoथावन

दिनथथाrdquo mal-cat=rdquoനിര േദശകംrdquo kas-cat=rdquo ہاون پرناوتۍ rdquo asm-cat=rdquoিনেদরশেবাধকrdquo kok-

cat=rdquoदशरतrdquo guj-catrdquoદશરકકrdquo tag=rdquoDMrdquogt

ltxsattribute name=type subcat = Deicticrdquo hin-cat=rdquordquo brx-cat=rdquoथ दिनथथाrdquo mal-

cat=rdquoതക സചകംrdquo kas-cat=rdquo وٲنيٲوۍ rdquo asm-cat=rdquoতয িনেদরশকrdquo kok-cat=rdquordquo guj-

catrdquoઉલદશરકrdquo tag=rdquoDMDgt

ltxsattribute name=type subcat =Relativerdquo hin-cat=rdquoसमबनन वाचतrdquo brx-

cat=rdquoसोमोनदो दिनथथाrdquo mal-cat=rdquoസംബനവാചി നിര േദശകംrdquo kas-cat=rdquo رٲبتٲوۍ rdquo

asm-cat=rdquoসবাচকrdquo kok-cat =rdquoसबद दशरतrdquo guj-catrdquoસપકrdquo tag=rdquoDMRgt

ltxsattribute name=type subcat =Wh-wordsrdquo hin-cat=rdquoपवाचतrdquo brx-cat=rdquoम

सथ दिनथथाrdquo mal-cat=rdquoേചാദവാചി നിര േദശകംrdquo kas-cat=rdquo لفظک rdquo asm-

cat=rdquoেবাধক অবযয়rdquo kok-cat=rdquoपसनाथन दशरतrdquo guj-catrdquoપવચચrdquo tag=rdquoDMQgt

-------------------------------------Verb Block---------------------------------------

ltxselement name=cat POS cat=rdquoVerbrdquo hin-cat=rdquoकयाrdquo brx-cat=rdquoथाइजाrdquo mal-cat=rdquoകിയrdquo

kas-cat=rdquo کراوت rdquo asm-cat=rdquoিয়াrdquo kok-cat=rdquoकयापदrdquo guj-catrdquoઆખતrdquo tag=rdquoVrdquogt

ltxsattribute name=type subcat =Auxiliary Verbrdquo hin-cat=rdquoसहायत कयाrdquo brx-

cat=rdquoलङाइ थाइजाrdquo mal-cat=rdquoസഹായക കിയrdquo kas-cat=rdquo ڈکهہ کراوت rdquo asm-

cat=rdquoসহায়কাৰী িয়াrdquo kok-cat=rdquoपालवी कयापदrdquo guj-catrdquordquo tag=rdquoVAUXgt

ltxsattribute name=type subcat =Main Verbrdquo hin-cat=rdquoमखय कयाrdquo brx-cat=rdquoगब

थाइजाrdquo mal-cat=rdquoധാന കിയrdquo kas-cat=rdquo راے کراوت rdquo asm-cat=rdquoাখয িয়াrdquo kok-

cat=rdquoमखल कयापदrdquo guj-catrdquoખrdquo tag=rdquoVMgt

ltxsattribute name=subtype subcat =Finiterdquo hin-cat=rdquoपरमrdquo brx-

cat=rdquoजाफजा थाइजाrdquo mal-cat=rdquoര ണ കിയrdquo kas-cat=rdquo ہشر ہاو rdquo asm-cat=rdquoসাািপকাrdquo

kok-cat=rdquoनी कयापदrdquo guj-catrdquoણરrdquo tag=rdquoVFgt

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 52: Tdil Mal Tags

52

CopyrightTDIL

ltxsattribute name=subtype subcat =Infinitiverdquo hin-cat=rdquoअनrdquo brx-

cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoകിയാരംrdquo kas-cat=rdquo ہشر کهاو rdquo asm-cat=rdquoঅসাািপকাrdquo

kok-cat=rdquoसादारण रपrdquo guj-catrdquoહતવ રrdquo tag=rdquoVINFgt

ltxsattribute name=subtype subcat =Gerundrdquo hin-cat=rdquoकयावाचतrdquo brx-

cat=rdquoजाफबाय थानाय दिनथथाrdquo kas-cat=rdquo کراوتہ ناوت rdquo asm-cat=rdquoিনিাতাতরক সক াrdquo kok-

cat=rdquoकयावाचत नामrdquo guj-catrdquoવતરાાનદદતrdquo tag=rdquoVNGgt

ltxsattribute name=subtype subcat =Non-Finiterdquo hin-cat=rdquoगर परमrdquo

brx-cat=rdquoजाफङ थाइजाrdquo mal-cat=rdquoഅര ണ കിയrdquo kas-cat=rdquo نا ہشر ہاو rdquo asm-

cat=rdquoঅসাািপকাrdquo kok-cat=rdquoअनी कयापदrdquo guj-catrdquoઅણરrdquo tag=rdquoVNFgt

------------------------------------Adjective Block----------------------------------

ltxselement name=cat POS cat=rdquoAdjectiverdquo hin-cat=rdquoवशणrdquo brx-cat=rdquoथाइलालrdquo mal-

cat=rdquoനാമ വിേശഷണംrdquo kas-cat=rdquo باوت rdquo asm-cat=rdquoিবেশষণrdquo kok-cat=rdquoवशशणrdquo guj-

catrdquoિવશષણrdquo tag=rdquoJJrdquogt

---------------------------------------Adverb Block----------------------------------

ltxselement name=cat POS cat=rdquoAdverbrdquo hin-cat=rdquoकया वशणrdquo brx-cat=rdquoथाइजान

थाइलालrdquo mal-cat=rdquoകിയാ വിേശഷണംrdquo kas-cat=rdquo بٲشلگہ rdquo asm-cat=rdquoিয়া িবেশষণrdquo

kok-cat=rdquoकयावशशणrdquo guj-catrdquoકિવશષણrdquo tag=rdquoRBrdquogt

-----------------------------------Post Position Block-------------------------------

ltxselement name=cat POS cat=rdquoPost Positionrdquo hin-cat=rdquoपरसगरrdquo brx-cat=rdquoसोदोब उन

महरथrdquo mal-cat=rdquoഅനേയാഗംrdquo kas-cat=rdquo پوت جاے rdquo asm-cat=rdquoঅনসগরrdquo kok-

cat=rdquoसबद अवययrdquo guj-catrdquoઅગકrdquo tag=rdquoPSPrdquogt

------------------------------------Conjunction Block-------------------------------

ltxselement name=cat POS cat=rdquoConjunctionrdquo hin-cat=rdquoयोजतrdquo brx-cat=rdquoदाजाब महरथrdquo

mal-cat=rdquoസമചയംrdquo kas-cat= rdquo واڻون rdquo asm-cat=rdquoসকেযাজকrdquo kok-cat=rdquoजोड अवययrdquo guj-

catrdquoસ કજકકrdquo tag=rdquoCCrdquogt

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 53: Tdil Mal Tags

53

CopyrightTDIL

ltxsattribute name=type subcat =Co-ordinatorrdquo hin-cat=rdquoसमनवयतrdquo brx-

cat=rdquoलोगो महरrdquo mal-cat=rdquoഏേകാിത സമചയംrdquo kas-cat=rdquo واڻت rdquo asm-

cat=rdquoসায়কrdquo kok-cat=rdquoसमानानीतरण जोड अवययrdquo guj-catrdquoસહકદશરકrdquo tag=rdquoCCDgt

ltxsattribute name=type subcat =Subordinatorrdquo hin-cat=rdquordquo brx-cat=rdquoलङाइ लोगो

महरrdquo mal-cat=rdquoആശരസചക സമചയംrdquo kas-cat=rdquo تحتون rdquo asm-cat=rdquordquo kok-

cat=rdquoआशी जोड अवययrdquo guj-catrdquoગૌણકદશરકrdquo tag=rdquoCCSgt

ltxsattribute name=subtype subcat =Quotativerdquo hin-cat=rdquoउ-वाचतrdquo mal-

cat=rdquoഉദാരണവാചി സമചയംrdquo brx-cat=rdquoमखrsquoथrdquo kas-cat= rdquo دپن نشانہ rdquo asm-cat=rdquordquo

kok-cat=rdquoअवरण -अथन उरrdquo guj-catrdquordquo tag=rdquoUTgt

------------------------------------Particles Block------------------------------------

ltxselement name=cat POS cat=rdquoParticlesrdquo hin-cat=rdquoअवययrdquo brx-cat=rdquoमहरथrdquo mal-

cat=rdquoനിാദംrdquo kas-cat=rdquo ڻوڻہ ونتۍ rdquo asm-cat=rdquoআনষকিগক অবযয়rdquo kok-cat=rdquoअवययrdquo guj-

catrdquoિાપતrdquo tag=rdquoRPrdquogt

ltxsattribute name=type subcat =Defaultrdquo hin-cat=rdquoवयकमrdquo brx-cat=rdquoगोरोिनथrdquo

mal-cat=rdquoസാമാനംrdquo kas-cat=rdquo ڈفالٹ rdquo asm-cat=rdquordquo kok-cat=rdquoसरभरस अवययrdquo guj-

catrdquoસવ rdquo tag=rdquoRPDgt

ltxsattribute name=type subcat =Classifierrdquo hin-cat=rdquoवगनतारतrdquo brx-cat=rdquoथ

दिनथथा दाजाबदाrdquo mal-cat=rdquoവര ഗകംrdquo kas-cat=rdquo ورگہا rdquo asm-cat=rdquoিনিদরতাবাচক সগরrdquo kok-

cat=rdquoवगरत अवययrdquo guj-catrdquordquo tag=rdquoCLgt

ltxsattribute name=type subcat =Interjectionrdquo hin-cat=rdquoवसमयादबोनतrdquo brx-

cat=rdquoसोमोनानाय दिनथथाrdquo mal-cat=rdquoവാേകകംrdquo kas-cat=rdquo ژهڻت rdquo asm-

cat=rdquoিবয়েবাধকrdquo kok-cat=rdquoउमाळी अवययrdquo guj-catrdquordquo tag=rdquoINJgt

ltxsattribute name=type subcat =Negationrdquo hin-cat=rdquoनतारातमतrdquo brx-cat=rdquoनङ

दिनथथाrdquo mal-cat=rdquoനിേഷദംrdquo kas-cat=rdquo نہ کٲرۍ rdquo asm-cat=rdquoনঞাতরকrdquo kok-cat=rdquoनहयतार

अवययrdquo guj-catrdquoાકરદશરકrdquo tag=rdquoNEGgt

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 54: Tdil Mal Tags

54

CopyrightTDIL

ltxsattribute name=type subcat =Intensifierrdquo hin-cat=rdquoीवतrdquo brx-cat=rdquoगन

दिनथथाrdquo mal-cat=rdquoതീവ നിാദംrdquo kas-cat=rdquo شدت ہار rdquo asm-cat=rdquordquo kok-cat=rdquoीवतार

अवययrdquo guj-catrdquoાતરચકrdquo tag=rdquoINTFgt

------------------------------------Quantifiers Block--------------------------------

ltxselement name=cat POS cat=rdquoQuantifiersrdquo hin-cat=rdquoसखयावाचीrdquo brx-cat=rdquoबबा

दिनथथाrdquo mal-cat=rdquoസംഖാവാചി irdquo kas-cat=rdquo گريند rdquo asm-cat=rdquoপিৰাাণবাচকrdquo kok-

cat=rdquoसखयादशरतrdquo guj-catrdquoપરાણરચકકrdquo tag=rdquoQTrdquogt

ltxsattribute name=type subcat =Generalrdquo hin-cat=rdquoसामानयrdquo brx-cat=rdquoसरासनसाrdquo

mal-cat=rdquoൊതസംഖാവാചിrdquo kas-cat=rdquo عمومی rdquo asm-cat=rdquoসাধাৰণrdquo kok-

cat=rdquoसामानयrdquo guj-catrdquoસાદrdquo tag=rdquoQTFgt

ltxsattribute name=type subcat =Cardinalsrdquo hin-cat=rdquoगणनासचतrdquo brx-cat=rdquoगब

बसानrdquo mal-cat=rdquoഅടിസാന സംഖാവാചിrdquo kas-cat=rdquo آنکونہ گريند rdquo asm-

cat=rdquoসকখযাবাচকrdquo kok-cat=rdquoसखयावाचतrdquo guj-catrdquoસખવચકrdquo tag=rdquoQTCgt

ltxsattribute name=type subcat =Ordinalsrdquo hin-cat=rdquoकमसचतrdquo brx-cat=rdquoफार

बसानrdquo mal-cat=rdquoകര മവാചിrdquo kas-cat=rdquo نۍ گريند وٴ rdquo asm-cat=rdquoাবাচক সকখযাবাচক

শrdquo kok-cat=rdquoकमवाचतrdquo guj-catrdquoકાવચકrdquo tag=rdquoQTOgt

------------------------------------Residuals Block----------------------------------

ltxselement name=cat POS cat=rdquoResidualsrdquo hin-cat=rdquoअवशषrdquo brx-cat=rdquoआदाrdquo mal-

cat=rdquoഅവശിഷദംrdquo kas-cat=rdquo باقيٲتی rdquo asm-cat=rdquordquo kok-cat=rdquoहरrdquo guj-catrdquoશષrdquo tag=rdquoRDrdquogt

ltxsattribute name=type subcat =Foreign wordrdquo hin-cat=rdquoवदशी शबदrdquo brx-

cat=rdquoगबन हादरार सोदोबrdquo mal-cat=rdquoഅനഭാഷാദംrdquo kas-cat=rdquo غٲر ملکی لفظ rdquo asm-

cat=rdquoিবেদশী শrdquo kok-cat=rdquoवदशीrdquo guj-catrdquoપરદશચ શબદકrdquo tag=rdquoRDFgt

ltxsattribute name=type subcat =Symbolrdquo hin-cat=rdquoपीतrdquo brx-cat=rdquoनसनrdquo mal-

cat=rdquoചിഹംrdquo kas-cat=rdquo عالمت rdquo asm-cat=rdquoতীকrdquo ki=rdquoतरrdquo guj-catrdquoસકતrdquo tag=rdquoSYMgt

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 55: Tdil Mal Tags

55

CopyrightTDIL

ltxsattribute name=type subcat =Unknownrdquo hin-cat=rdquoअाrdquo brx-cat=rdquoमथयrdquo

mal-cat=rdquoഇതരദംrdquo kas-cat=rdquo ازون rdquo asm-cat=rdquoঅ াতrdquo kok-cat=rdquoअनवळखीrdquo guj-

catrdquoઅણ શબદકrdquo tag=rdquoUNKgt

ltxsattribute name=type subcat =Punctuationrdquo hin-cat=rdquoवरामाद-चrdquo brx-

cat=rdquoथाद rsquoसन खािनथrdquo mal-cat=rdquoവിരാമ ചിഹംrdquo kas-cat=rdquo لہجون rdquo asm-cat=rdquoযিত

িচনrdquo kok-cat=rdquoवरामतरrdquo guj-catrdquoિવરાિચહકrdquo tag=rdquoPUNCgt

ltxsattribute name=type subcat =Echowordsrdquo hin-cat=rdquoपवन-शबदrdquo brx-

cat=rdquoरखा सोदोबrdquo mal-cat=rdquoമാെറാലിവാകrdquo kas-cat=rdquo پوت دنۍ لفظ rdquo asm-

cat=rdquoনযাতক শrdquo kok-cat=rdquoपडसाद उराrdquo guj-catrdquoઅરણાતાકrdquo tag=rdquoECHgt

ltxsattributegt

ltxselementgt ltxsschemagt

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 56: Tdil Mal Tags

56

CopyrightTDIL

13 ONE TO ONE MAPPING LABELS FOR INDIAN LANGUAGES To incorporate such facility in the xml Schema the common one to one mapping table for the labels has been developed as presented in the Table 1 Table 2 and Table 3

Languages Hindi Punjabi Urdu Gujarati Oriya Bengali SNo English Hindi Punjabi Urdu Gujarati Oriya Bengali

1 Noun सा ਨਵ اسم સજ ସଂଞା িবেশষয common जावाचत ਆਮ نکره િતવચક ଜାତବାଚକ জািতবাচক

Proper वयवाचत ਖਾਸ معرفہ વયતવચક ବୟକତବାଚକ বযিিবাচক

Verbal कयामलत तद

ਿਕਿਰਆਮਲਕ حاصل مصدر

કવચક କରୟାବାଚକ

িয়াালক

Nloc दश-ताल साप

ਸਿਥਤੀ ਸਚਕ ظرف સાવચક ଦେଶ-କାଳ ସାପେକଷ

ানবাচক

2 Pronoun सवरनाम ਪੜਨਵ ضمير સવરાા ସରବନାମ সবরনাা

Personal वयवाचत ਪਰਖਵਾਚੀ ضمير شخصی

ષવચક ବୟକତବାଚକ বযিিবাচক

Reflexive नजवाचत ਿਨਜਵਾਚੀ ضمير معکوسی

પિતિતિતત ଆତମବାଚକ আতবাচক

Reciprocal पारसपरत ਪਰਸਪਰੀ ضمير راجع

પરસપરવચચ ପାରସପାରକ বযিতহাা

Relative सबन- वाचत ਸਬਧਵਾਚੀ ضمير موصولہ

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ ضمير استفہاميہ

પ રવચક ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 3 Demonstrative नयवाचत

सतवाचत ਸਕਤਵਾਚੀ ےاشار દશરકક ନଶଚୟବାଚକସ

ଂକେତବାଚକ িনেদরশক

Deictic नदशी ਪਤਖ ਪਮਾਣਵਾਚੀ هاشار ઉલદશરક তয িনেদরশক

Relative सबनवाचत ਸਬਧਵਾਚੀ هاشار موصول

સપક ସଂବନଧବାଚକ সবাচক

Wh-words पवाचत ਪਸ਼ਨਵਾਚੀ هاشار استفہاميہ

પવચચ ପରଶନବାଚକ বাচক

Indefinite अनयवाचत NA NA અિાિત સવરાા NA অিনেদরশয 4 Verb कया ਿਕਿਰਆ فعل આખત କରୟା িয়া

Auxiliary Verb

सहायत कया

ਸਹਾਇਕ

ਿਕਿਰਆ

امدادی فعل સહકર

ସହାୟକ କରୟା

েগৗণ িয়া

Main Verb

मखय कया ਮਖ ਿਕਿਰਆ فعل الزم

ખ ମଖୟ କରୟା াখয িয়াপদ

Finite परम ਕਾਲਕੀ لفع محدود

ણર ପରମତ সাািপকা

Infinitive कयाथरत सा ਅਿਮਤ مصدر હતવ ર ଅନନତ অপণর িয়া

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 57: Tdil Mal Tags

57

CopyrightTDIL

Gerund कयावाचत ਿਕਿਰਆਵਾਚੀ حاصل مصدر

વતરાાનદદત କରୟାବାଚକ েযাজক িয়া

Non-Finite गर-परम ਅਕਾਲਕੀ فعل غير محدود

અણર ଅପରମତ অসাািপকা

Participle Noun

तद परत नाम NA NA NA NA িয়াজাত িবেশষয

5 Adjective वशषण ਿਵਸ਼ਸ਼ਣ صفت િવશષણ ବଶେଷଣ িবেশষণ

6 Adverb कया-वशषण ਿਕਿਰਆ ਿਵਸ਼ਸ਼ਣ متعلق فعل કિવશષણ କରୟା-ବଶେଷଣ িয়া-িবেশষণ

7 Post Position

परसगर ਸਬਧਕ جار موخر અગક ପରସରଗ পাসগর

8 Conjunction योजत ਯਜਕ حرف عطف સ કજકક ସଂଯୋଜକ সকেযাগালক

Co-ordinator समनवयत ਸਮਾਨ ਯਜਕ حرف وصل સહકદશરક ସମନ ୟକ

সায়ক

Subordinator अनीनसथ ਅਧੀਨ ਯਜਕ حرف تابع کننده

ગૌણકદશરક শতর সকেযাজক

Quotative उ-वाचत ਕਥਨਵਾਚੀ حرف اقتباسی

NA ଉକତବାଚକ উিিবাচক

9 Particles अवयय ਿਨਪਾਤ حرف حاليہپابند

િાપત ଅବୟୟ ନପାତ

অবযয়

Default वयकम ਤਰਟੀਵਾਚਕ حرف ڈيفالٹ

સવ ବୟତକରମ সাধাাণ অবযয়

Classifier वगनतारत ਵਰਗੀਿਕਤ حرف درجہ بند

NA ବରଗୀକାରକ বগরবাচক

Interjection वसमयादबोनत ਿਵਸਮਕ حرف فجائيہ િવસાઆદ

તકધક

ବସମୟ ବୋଧକ িবয়ািদেবাধক

Negation नतारातमत ਨਹਵਾਚੀ حرف نہی ાકરદશરક ନଷେଧାତମକ নঞতরক

Intensifier ीवत ਤੀਬਰਤਾਵਾਚੀ تاکيدحرف ાતરચક ତୀବରତାବାଚକ তীতােবাধক 10 Quantifiers सखयावाची ਸਿਖਆਵਾਚੀ کميت نما પરાણરચકક ସଂଖୟାବାଚୀ পিাাাণবাচক

General सामानय ਸਧਾਰਨ عمومی عام સાદ ସାମାନୟ সাধাাণ

Cardinals गणनासचत ਿਗਣਤੀਸਚਕ اعداد مطلق સખવચક ଗଣନାସଚକ সকখযাবাচক

Ordinals कमसचत ਕਮਸਚਕ ترتيبی اعداد કાવચક କରମସଚକ াবাচক

11 Residuals अवशष ਬਾਕੀ باقی مانده શષ ଅବଶେଷ অবিশ পদ

Foreign word

वदशी शबद ਿਵਦਸ਼ੀ ਸ਼ਬਦ بيرونی لفظ પરદશચ શબદક ବଦେଶୀ ଶବଦ িবেদশী শ

Symbol पीत ਸਕਤ عالمت સકત ପରତୀକ তীক

Unknown अा ਅਿਗਆਤ نامعلوم અણ શબદક ଅଞାତ অ াত

Punctuation वरामाद-च ਿਵਸ਼ਰਾਮ ਿਚਨ િવરાિચહક ବରାମ ଚହନ যিতিচ اوقاف

Echowords पवन-शबद ਪਿਤਧਨੀ ਸ਼ਬਦ گونج دار الفاظ

અરણાતાક ପରତଧ ନୀ অনকাা শ

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 58: Tdil Mal Tags

58

CopyrightTDIL

Languages Assamese Bodo Kashmiri (Urdu Script) Kashmiri (Hindi Script) Marathi SNo English Hindi Assamese Bodo Kashmiri Kashmiri

(Hindi) Marathi

1 Noun सा িবেশষয ममा ناوت नाव नाम common जावाचत জািতবাচক फोलर दिनथथा عام आम सामानय

नाम Proper वयवाचत বযিিবাচক म दिनथथा خاص ख़ास विशष नाम Verbal कयामलत

तद িয়াবাচক

हाबा दिनथथा کراوتٲوۍ कावावय धातसाधित

नाम

Nloc दश-ताल साप

ানবাচক

थावन दिनथथा ममा ناوتہ جايہ ہاو नाव जाय हाव

दश कालवाचक

नाम 2 Pronoun सवरनाम সবরনাা मराइ پرناوت पर नाव सरवनाम Personal वयवाचत বযিিবাচক सब दिनथथा شخصيٲتی शिखसयाी परषवाचक

Reflexive नजवाचत আতবাচক गाव दिनथथा ماکوسی मातसी आतमवाचक

Reciprocal पारसपरत পাৰিৰক

गावज गाव सोमोनदो

बाहमी باہمیबोहमी

पारसपारिक

Relative सबन- वाचत সবাচক सोमोनदो दिनथथा

रोबावय सबधवाची رٲبتٲوۍ

Wh-words पवाचत েবাধক সবরনাা सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत

3 Demonstrative नयवाच सतवाचत

িনেদরশেবাধক थावन दिनथथा

हावन ہاون پرناوتۍपरनावतय

दरशक

Deictic नदशी তয িনেদরশক

थ दिनथथा وٲنيٲوۍ वोनयोवय

Relative समबनन

वाचत সবাচক सोमोनदो दिनथथा رٲبتٲوۍ रोबातय सबधवाच

सबधदरशक

Wh-words पवाचत েবাধক অবযয়

म सथ दिनथथा ک لفظ त-लफ़ परशनारथक

Indefinite अनयवाचत NA NA NA NA NA 4 Verb कया িয়া थाइजा کراوت काव करियापद Auxiliary

Verb सहायत कया

সহায়কাৰী িয়া

लङाइ थाइजा کراوتڈکهہ डख काव सहायकारी करियापद

Main Verb मखय कया

াখয িয়া गब थाइजा راے کراوت राय काव मखय करियापद

Finite परम সাািপকা

जाफजा थाइजा ہشر ہاو हशर हाव

आखयात करियारप

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 59: Tdil Mal Tags

59

CopyrightTDIL

Infinitive अन অসাািপকা जाफङ थाइजा ہشر کهاو हशर खाव भाववाचक कदत

Gerund कयावाचत িনিাতাতরক সক া

जाफबाय थानाय दिनथथा

काव کراوتہ ناوتनाव

विभकतिकषम कदतरप

Non-Finite गर-परम অসাািপকা

जाफङ थाइजा نا ہشر ہاو ना हशर हाव

आखयाततर करियारप

Participle Noun

तद परत नाम

NA NA NA NA NA

5 Adjective वशषण িবেশষণ थाइलाल باوت बाव विशषण 6 Adverb कया-वशषण িয়া

িবেশষণ थाइजान थाइलाल بٲشلگہ लग बाश करियाविशषण

7 Post Position

परसगर অনসগর

सोदोब उन महरथ پوت جاے पो जाय

अतयसथान

8 Conjunction योजत সকেযাজক

दाजाब महरथ واڻون राटवन उभयानवयी अवयय

Co-ordinator समनवयत সায়ক लोगो महर واڻت वाट वाटथ

NA

Subordinator अनीनसथ NA लङाइ लोगो महर تحتون हन NA

Quotative उ-वाचत NA मखrsquoथ دپن نشانہ दपन नशान

उदगारवाचक

9 Particles अवयय আনষকিগক অবযয় महरथ

टोट वनतय अवयय ڻوڻہ ونتۍनिपात

Default वयकम गोरोिनथ ڈفالٹ डफालट सामानय Classifier वगनतारत িনিদরতাবাচক

সগর थ दिनथथा दाजाबदा

वरगहा NA ورگہا

Interjection वसमयादबोनत

িবয়েবাধক सोमोनानाय

दिनथथा

छट ژهڻت

छटथ

विसमयवाचक

Negation नतारातमत নঞাতরক नङ दिनथथा نہ کٲرۍ नतारय निषधातमक

Intensifier ीवत गन दिनथथा شدت ہار शद हाव तीवरतावाचक

10 Quantifiers सखयावाची পিৰাাণবাচক बबा दिनथथा گريند थनद सखयावाचक

General सामानय সাধাৰণ सरासनसा عمومی अममी सामनय Cardinals गणनासचत সকখযাবাচক गब बसान آنکونہ گريند ओतवन

थनद

गणनावाचक

Ordinals कमसचत াবাচক সকখযাবাচক শ

फार बसान نۍ گريند वनय وٴथनद

करमवाचक

11 Residuals अवशष NA आदा باقيٲتی बाक़याी शष

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 60: Tdil Mal Tags

60

CopyrightTDIL

Foreign word

वदशी शबद

িবেদশী শ

गबन हादरार सोदोब

غٲر ملکی لفظ

गोर मलत लफ़

विदशी शबद

Symbol पीत তীক नसन عالمت अलाम चिनह Unknown अा অ াত मथय ازون अोन अजञात Punctuation वरामाद-च যিত িচন

थाद rsquoसन खािनथ لہجون लहिजवन विरामचिनह

Echowords पवन-शबद নযাতক শ रखा सोदोब پوت دنۍ لفظ पॊ दनय

लफ़

नादानकारी अभयसत

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 61: Tdil Mal Tags

61

CopyrightTDIL

Languages Telugu Malayalam Tamil Konkani SNo English Hindi Telugu Malayalam Tamil Konkani

1 Noun सा సంజఞ നാമം பெயர नाम common जावाचत జతవచకం സാമാന നാമം பொதுப

பெயர जावाचत नाम

Proper वयवाचत వయకతవచకం സംജാ നാമം சிறபபுப பெயர

वयवाचत

नाम Verbal कयामलत

तद కరయమలకం NA தொழில

பெயர कयामळत नाम

Nloc दश-ताल साप

దశ-కల సపకషకం ആധാരിക നാമം இடப பெயர थळ -ताळ-साप नाम

2 Pronoun सवरनाम సరవనమం സര വനാമം பதிலடுப பெயர

सवरनाम

Personal वयवाचत వయకతవచకం രഷ സര വനാമം

மூவிடபபெய परश सवरनाम

Reflexive नजवाचत ఆతమరథకం നിചവാചി സര വനാമം

தறசுடடுப பதிலடுப

பெயர

आतमवाचत

सवरनाम

Reciprocal पारसपरत పరసపరకం സംബനവാചി സര വനാമം

பரஸபர பதிலடுப

பெயர

सबद सवरनाम

Relative सबन- वाचत సంబంధ-వచకం ാരസിക സര വനാമം

இணைபபு பதிலடுப

பெயர

एतमत सवरनाम

Wh-words पवाचत పశర నవచకం േചാദവാചി സര വനാമം

வினாச சொல

पसनाथन सवरनाम

Indefinite अनयवाचत NA சுடடு अनि सवरनाम 3 Demonstrative नयवाचत

सतवाचत నరదశకవచకం നിര േദശകം நேரசசுடடு दशरत

Deictic नदशी నరదషట തക സചകം சுடடு

பதிலடுப பெயர

दशरत उर

Relative सबनवाचत సంబంధ-వచకం സംബനവാചി നിര േദശകം

வினாச சொல सबद दशरत

Wh-words पवाचत పశర నవచకం േചാദവാചി നിര േദശകം

வினை पसनाथन दशरत

Indefinite अनयवाचत NA NA துணை வினை अनि सवरनाम 4 Verb कया కరయ കിയ முதனமை

வினை कयापद

Auxiliary Verb

सहायत कया సహయక కరయ സഹായക കിയ முறறு வினை पालवी कयापद

Auxiliary Finite

(पणर पालवी

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 62: Tdil Mal Tags

62

CopyrightTDIL

कयापद)

Auxiliary Non Finite

(अपणर पालवी कयापद)

Main Verb मखय कया మఖయ కరయ ധാന കിയ குறை எசசம मखल कयापद Finite परम సమపక ര ണ കിയ வினைப பெயர नी कयापद Infinitive कयाथरत सा తమననరథకం കിയാരം வினை எசசம सादारण रप Gerund कयावाचत కరయవచకం NA பெயரடை कयावाचत नाम Non-Finite गर-परम అసమపక അര ണ കിയ வினையடை अनी

कयापद Participle

Noun तद परत नाम NA NA பினனுருபு NA

5 Adjective वशषण వశషణం നാമ വിേശഷണം இணைபபுச

சொல वशशण

6 Adverb कया-वशषण కరయవశషణం കിയാ

വിേശഷണം இணை

இணைபபுச சொல

कयावशशण

7 Post Position

परसगर పరసరగ അനേയാഗം

சாரபு இணைபபுச

சொல

सबद अवयय

8 Conjunction योजत సమచఛయం സമചയം நிரபபு இடைசசொல

जोड अवयय

Co-ordinator समनवयत సమనధకరణం ഏേകാിത സമചയം

இடைசசொல समानानीतरण जोड अवयय

Subordinator अनीनसथ వయధకరణం ആശരസചക

സമചയം

முனனிருபபு आशी जोड अवयय

Quotative उ-वाचत అనుకరకం ഉദാരണവാചി സമചയം

இனபபிரிபபு ஒடடு

अवरण -अथन उर

9 Particles अवयय అవయయం നിാദം வியபபிடைச சொல

अवयय

Default वयकम వయతకరమం സാമാനം எதிரமறை सरभरस अवयय Classifier वगनतारत వరగకరకం വര ഗകം மிகுவிபபான वगरत अवयय Interjection वसमयादबोनत వసమయదబ ధకం വാേകകം அளவையடை उमाळी अवयय Negation नतारातमत నకరతమకం നിേഷദം பொது नहयतार अवयय

Intensifier ीवत అతశయరథకం തീവ നിാദം எணணுப பெயர

ीवतार अवयय

10 Quantifiers सखयावाची సంఖయవచకం സംഖാവാചി எணணு முறைப பெயர

सखयादशरत

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 63: Tdil Mal Tags

63

CopyrightTDIL

General सामानय సమనయం ൊതസംഖാവാചി

எஞசியவை सामानय

Cardinals गणनासचत గణనసూచకం അടിസാന സംഖാവാചി

அயல சொல सखयावाचत

Ordinals कमसचत కరమసూచకం കര മവാചി குறியடு कमवाचत 11 Residuals अवशष అవశషం അവശിഷദം தெரியாதது हर

Foreign word

वदशी शबद వదశ శబదం അനഭാഷാദം நிறுததறகுறியடு

वदशी

Symbol पीत సంకతం ചിഹം இரடடைககிளவி

तर

Unknown अा అజఞత ഇതരദം NA अनवळखी Punctuation वरामाद-च వరమం വിരാമ ചിഹം NA वरामतर Echo-words पवन-शबद పరతధవన-శబంద മാെറാലിവാക NA पडसाद उरा

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 64: Tdil Mal Tags

64

CopyrightTDIL

14 ALGORITHM FOR SELECTION OF NODES

If script is Devanagari then

If language is Hindi then

Display (Metadata)

Call (POS Schema)

Display (English and Hindi Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquotag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Bodo then

Call (POS Schema)

Display (English Hindi and Bodo Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo brx-cat=rdquoममाrdquo tag=rdquoNrdquogt

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 65: Tdil Mal Tags

65

CopyrightTDIL

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo brx-cat=rdquoफोलर

दिनथथाrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoम

दिनथथाrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo brx-cat=rdquoमराइrdquo tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo brx-cat=rdquoसब

दिनथथाrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo brx-cat=rdquoगाव

दिनथथाrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

If language is Konkani then

Call (POS Schema)

Display (English Hindi and Konkani Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kok-cat=rdquoनामrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kok-

cat=rdquoजावाचत नामrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kok-

cat=rdquoवयवाचत नामrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kok-cat=rdquoसवरनामrdquo

tag=rdquoPRrdquogt

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 66: Tdil Mal Tags

66

CopyrightTDIL

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kok-cat=rdquoपरश

सवरनामrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kok-

cat=rdquoआतमवाचत सवरनामrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Malyalam (Orthographic variation) then

If language is Malyalam then

Call (POS Schema)

Display (English Hindi and Malyalam Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo mal-cat=rdquoനാമംrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo mal-

cat=rdquoസാമാന നാമംrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoസംജാ നാമംrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo mal-cat=rdquoസര വനാമംrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo mal-

cat=rdquoരഷ സര വനാമംrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo mal-

cat=rdquoനിചവാചി സര വനാമംrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 67: Tdil Mal Tags

67

CopyrightTDIL

End if

Else If script is Perso-Arabic then

If language is Kashmiri then

Call (POS Schema)

Display (English Hindi and Kashmiri Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo kas-cat=rdquo ناوت rdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo kas-cat=rdquo عام rdquo

tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo خاص rdquo

tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo kas-cat=rdquo پرناوت rdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo kas-cat=rdquo

lttag=rdquoPRP rdquoشخصيٲتی

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo kas-cat=rdquo

lttag=rdquoPRF rdquoماکوسی

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 68: Tdil Mal Tags

68

CopyrightTDIL

Else If script is Bangla then

If language is Assamese then

Call (POS Schema)

Display (English Hindi and Assamese Nodes)

Hide (remaining nodes)

Eg

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo asm-cat=rdquoিবেশষযrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo asm-

cat=rdquoজািতবাচকrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo asm-cat=rdquoসবরনাাrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo asm-

cat=rdquoবযিিবাচকrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo asm-

cat=rdquoআতবাচকrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

Else If script is Gujarati then

If language is Gujarati then

Call (POS Schema)

Display (English Hindi and Gujarati Nodes)

Hide (remaining nodes)

Eg

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 69: Tdil Mal Tags

69

CopyrightTDIL

ltxselement name=cat POS cat=rdquonounrdquo hin-cat=rdquoसाrdquo guj-cat=rdquoસજrdquo tag=rdquoNrdquogt

ltxsattribute name=type subcat=commonrdquo hin-cat=rdquoजावाचतrdquo guj-

cat=rdquoિતવચકrdquo tag=rdquoNNgt

ltxsattribute name=type subcat =Properrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoવયતવચકrdquo tag=rdquoNNPgt

ltxselement name=cat POS cat=rdquoPronounrdquo hin-cat=rdquoसवरनामrdquo guj-cat=rdquoસવરાાrdquo

tag=rdquoPRrdquogt

ltxsattribute name=type subcat =Personalrdquo hin-cat=rdquoवयवाचतrdquo guj-

cat=rdquoષવચકrdquo tag=rdquoPRPgt

ltxsattribute name=type subcat =Reflexiverdquo hin-cat=rdquoनजवाचतrdquo guj-

cat=rdquoપિતિતિતતrdquo tag=rdquoPRFgt

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

End if

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 70: Tdil Mal Tags

70

CopyrightTDIL

15 REFERENCE BASED IMPLEMENTATION

Hindi 1 सपरयN_NNP तPSP दशरनN_NN सPSP मलाV_VM हV_VM मोN_NN RD_PUNC

2 हदN_NN नमरN_NN मPSP ीथरN_NN ताPSP बड़ाJJ महतवN_NN हV_VM RD_PUNC

3 यRP_RPD ोRP_RPD हरQT_QTF ीथरN_NN बड़ाJJ औरCC_CCD अहमJJ हV_VM

RD_PUNC लतनCC_CCS साQT_QTC सथानN_NN तPSP बड़ीJJ महाN_NN

औरCC_CCD मानयाN_NN हV_VM RD_PUNC

4 यDM_DMD साQT_QTC नमरसथलN_NN साQT_QTC नगरN_NN याRP_RPD

सपरयN_NNP तPSP रपN_NN मPSP थथN_NN मPSP वणर V_VM हV_VAUX

RD_PUNC 5 ऐसाDM_DMD तहाV_VM गयाV_VAUX हV_VAUX तCC_CCS चमारसN_NNP मPSP

इनDM_DMD सपरयN_NNP ताPSP दशरनN_NN मोN_NN पदानN_NN तरनV_VM

वालाPSP होाV_VM हV_VAUX RD_PUNC

Punjabi

1 ਸਪਤਪਰੀਆN_NN ਦPSP ਦਰਸ਼ਨN_NN ਨਾਲPSP ਿਮਲਦਾV_VM_VNF ਹV_VAUX ਮਖN_NN

2 ਿਹਦN_NN ਧਰਮN_NN ਿਵਚPSP ਤੀਰਥN_NN ਦਾPSP ਬਹਤQT_QTF ਮਹਤਵN_NN

ਹV_VAUX |RD_PUNC

3 ਝRB ਤCC_CCS ਹਰQT_QTF ਤੀਰਥN_NN ਵਡਾJJ ਤCC_CCS ਅਿਹਮJJ ਹV_VAUX

CC_CCS ਪਰCC_CCS ਸਤQT_QTC ਸਥਾਨN_NN ਦੀPSP ਬਹਤQT_QTF ਮਹਤਤਾN_NN

ਅਤCC_CCD ਮਾਨਤਾN_NN ਹV_VAUX |RD_PUNC

4 ਇਹDM_DMD ਸਤQT_QTC ਧਰਮN_NN ਸਥਾਨN_NN ਸਤQT_QTC ਨਗਰN_NN

ਜCC_CCD ਸਪਤਪਰੀਆN_NN ਦPSP ਰਪN_NN ਿਵਚPSP ਗਰਥN_NN ਿਵਚPSP

ਦਰਜN_NN ਹਨV_VAUX |RD_PUNC

5 ਇਝV_VM_VNF ਿਕਹਾV_VM_VNF ਿਗਆV_VM_VF ਹV_VM_VNF ਿਕCC_CCS ਚਥQT_QTO

ਮਹੀਨ N_NN ਿਵਚPSP ਇਨ PSP ਸਪਤਪਰੀਆN_NN ਦਾPSP ਦਰਸ਼ਨN_NN ਮਖN_NN

ਪਦਾਨN_NN ਵਾਲਾPSP ਹਦਾV_VM_VNF ਹV_VAUX |RD_PUNC

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 71: Tdil Mal Tags

71

CopyrightTDIL

Tamil

1 சபதகைN_NN சிபதாV_VM_VNG கிN_NN

ிகடகிிறV_VM_VF RD_PUNC

2 இநறN_NNP மதிாN_NN தணணJJ இடஙகN_NN மிமRP_INTF

சிிபதN_NN வதயநகவN_NN ஆமV_VAUX RD_PUNC

3 ஒவவதாQT_QTC தணணதமN_NN றN_NN

மறமCC_CCD கிதறவமN_NN வதயநறN_NN ஆமV_VAUX

ஆனதாCC_CCS ஏQT_QTC இடஙகN_NN மிRP_INTF சிிபதமN_NN

மிபதமN_NN வதயநதமV_VM_VF RD_PUNC

4 இநDM_DMD ஏQT_QTC தணணதஙகN_NN ஏQT_QTC

நரஙகN_NN அாறCC_CCD சபதகN_NN எனCC_CCS_UT

ததஙைகாN_NN வரணகபபV_VM_VNF இாகினினV_VAUX

RD_PUNC

5 ௗரமிணாN_NN இநDM_DMD சபதணனN_NN சனமN_NN

கிகN_NN வழஙிிறV_VM_VF எனCC_CCS_UT

சதாபபV_VM_VNF இாகிிறV_VAUX RD_PUNC

Malayalam

1 ഏഴQT_QTC ണനഗരികളിN_NN സനരശികനതV_VM_VNF

െകാണRP_RPD േമാകംN_NN ലഭികനV_VM_VF RD_PUNC

2 ഹിനN_NN മതതിതിN_NN ണസലങളകN_NN വലിയJJ

മഹതംN_NN ഉണV_VAUX RD_PUNC

3 എലാQT_QTF തീരാടനസലങളംN_NN വലതംJJ ധാനെെനതംJJ

ആണV_VAUX RD_PUNC എങിലംCC_CCD ഈDM_DMD ഏഴQT_QTC

സലങളകംN_NN വലിയJJ േശശഠതയംN_NN ആദരവംN_NN

ഉണV_VAUX RD_PUNC

4 ഈDM_DMD ഏഴQT_QTC ധരമസലങളംN_NN ഏഴQT_QTC

നണങളിN_NN അഥവാCC_CCD ഏഴQT_QTC ണനഗരികളിN_NN

എനCC_CCD രീതിയിതിN_NN ഗഗങളിതിN_NN വരണിചിനണV_VM_VF

RD_PUNC

5 ചതരിമാസതിതിN-NNP ഈDM_DMD ണസലങളെടN_NN

സനരശനംN_NNV േമാകദായകമാെണനN_NN റഞിനണV_VM_VF

RD_PUNC

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 72: Tdil Mal Tags

72

CopyrightTDIL

Bangla

1 সপিাN_NNP দশরনN_NN কোV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

2 িহN_NN ধোরN_NN তীেতরাN_NN যেতJJ াহN_NN আেছV_VAUX ৷RD_PUNC

3 যিদওPSP সাQT_QTF তীতরN_NN যেতJJ গপণরN_NN তাওPSP সাতিQT_QTC

জায়গাাN_NN িবেশষJJ গN_NN ওCC_CCD াহN_NN আেছV_VAUX ৷RD_PUNC

4 এইDM_DMD সাতিQT_QTC ধারলN_NN সাতQT_QTC নগাN_NN বাCC_CCD সপিাN-

NNP নাোN_NN পিািচতN_NN ৷RD_PUNC

5 এটাDM_DMD বলাV_VM_VNG হয়V_VAUX েযRP_RPD চতর দশীেতN_NN এইDM_DMD

সপিাN-NNP দশরনN_NN কােলV_VM_VNF োালাভN_NN হয়V_VAUX ৷RD_PUNC

Marathi

1 सापरचयाN_NNP दशरनानN_NN मळोVM मोN_NN PUNC

2 हदJJ नमारमयN_NN ीथराचN_NN खपQT_QTF महवN_NN आहVM PUNC

3 सPR रRP पतयतQT_QTF ीथरN_NN महवाचN_NN आणC_CCD मखयJJ आहVM

पणC_CCD साQ-QTC सथानाचN_NN महवN_NN आणC_CCD मानयाN_NN मोठJJ

आहVM PUNC

4 हDM साQ_QTC नमरसथळN_NN साQ_QTC नगरN_NN वाC_CCD सपरचयाNNP

रपाN_NN थथामयN_NN वणरललJJ आहVM PUNC

5 असPR महटलVM गलVAUX आहVAUX तC_CCD चामारसामयN_NN याC_CCD

सपरचNNP दशरनN_NN मोN_NN दणारV_VM_VNF ठरVM PUNC

Gujarati

1 સપતરાN_NNP દશરાચN_NN ાળV_VM છV_VAUX ાકકN_NN

2 હધારાN_NN તચ રN_NN ઘQT_QTF ાહતતવJJ છV_VM

3 આાRP_RPD તકRP_RPD દરકDM_DMD તચ રN_NN ાહાJJ અાCC_CCD ાહતતવણરJJ

છV_VM પણCC_CCD સતQT_QTC સાકાચN_NN ાહN_NN અાCC_CCD

ાદતN_NN છV_VM

4 આDM_DMD સતQT_QTC ઘારસળN_NN સતQT_QTC ાગરN_NN અવCC_CCD

સપતરાN-NNP સવવપN_NN ગકાN_NN વણરવV_VM છV_VAUX

5 એાRP_RPD કહવV_VM કCC_CCS ચા રસાN_NN આDM_DMD સપતરાN-

NNP દશરાN_NN ાકકN_NN આપારV_VM હકV_VAUX છV_VAUX

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 73: Tdil Mal Tags

73

CopyrightTDIL

Konkani

1 सपरचN_NNP दशरनN_NN घलयारV_VM_VNF मोN_NN मळटाV_VM_VF RD_PUNC

2 हदN_NNP नमाN_NN थरसथानातN_NN वहडJJ महतवN_NN आसाV_VM_VF RD_PUNC

3 शRB पळोवपातV_VM_VNF गलयारV_VM_VNF सगलचQT_QTF थाN_NN वहडJJ

आनीCC_CCD खाशलJJ आसाV_VM_VF पणCC_CCS साQT_QTC सथळाN_NN वहडJJ

आनीCC_CCD महतवाचीJJ अशRB मानाV_VM_VF RD_PUNC

4 थथानीN_NN ाDM_DMD सायQT_QTC नमरसथळाचN_NN वणरनN_NN साQT_QTC

नगरN_NN वाCC_CCD सपरN_NNP अशRB आसाV_VM_VF RD_PUNC

5 चामारसाN_NN हPR_PRP सपरचN_NNP दशरनN_NN मोN_NN मळोवनV_VM_VNF

दवपीV_VM_VNG थाराV_VM_VF अशRB मानाV_VM_VF RD_PUNC

Urdu

N_NNنجات V_VAUXہے V_VMملتی PSPسے N_NNزيارت PSPکی N_NNPستپوريوں 1 V_VAUXہے N_NNاہميت QT_QTFبڑی PSPکی N_NNتيرته PSPميں N_NNمذہب N_NNہندو 2 V_VAUXہيں N_NNاہم CC_CCDاور N_NNبڑی N_NNتيرته QT_QTFہر PSPتو PSPيوں 3

RD_PUNC ليکنPSP ساتQT_QTC مقاماتN_NN کیPSP بڑیN_NN عظمتN_NN اورCC_CCD V_VAUXہے N_NNمقبوليت

CC_CCDيا N_NNشہروں QT_QTCسات N_NNمقامات JJمذہبی QT_QTCساتوں PR_PRPيہ 4اتس QT_QTC پوريوںN_NN کیPSP شکلN_NN ميںPSP کتابوںN_NN ميںPSP مذکورJJ

RD_PUNC V_VAUXہيں PSPميں N_NNبرساتموسم CC_CCDکہ V_VAUXہے V_VMگيا V_VMکہا DM_DMDايسا 5

JJفراہم N_NNنجات N_NNزيارت PSPکی N_NNشہروں QT_QTCساتوں DM_DMDان V_VAUXہے V_VMہوتی V_VAUXوالی V_VM_VFکرنے

Oriya

1 ସପତପରୀଗଡ଼କN__NN ର PSP ଦରଶନ NN ର PSP ମୋକଷ NN ମଳଥାଏ N__NNV |

2 ହନଦଧରମN__NN ରେ PSP ତୀରଥ NN ର PSP ବଡ଼ JJ ମହତ NN ଅଟେ V__VAUX |

3 ଏପରକ RP__RPD ସବPR__PRL ତୀରଥ NN ବଡ଼ JJ ଏବଂCC__CCD ମଖୟ JJ ଅଟନତ V__VAUX ପରନତ CC__CCS ସାତ QT__QTC ସଥାନଗଡ଼କର N__NN ଶରେଷଠ JJ ମହନୀୟତାN__NN ଓ CC__CCD

ମାନୟତାN__NN ଅଟେ V__VAUX |

4 ଏହPR ସାତ QT__QTC ଧରମସଥଳ NN ସାତ QT__QTC ନଗରଗଡ଼କ N__NN ର PSP କଂବା CC__CCD

ସପତପରୀଗଡ଼କN__NN ର PSP ରପ JJ ରେ PSP ଗରନଥଗଡ଼କN__NN ରେ PSP ବରଣତ N__NNV ହେଇଅଛ V__VAUX |

5 ଏଭଳ PR କହାଯାଇ V__VM ଅଛ V__VAUX କ CC__CCS ଚରତମାସ N__NN ରେ PSP ଏହ PR

ସପତପରୀଗଡ଼କ N__NN ର PSP ଦରଶନ NN ମୋକଷ NN ପରଦାନ V__VAUX କରବାବାଲା NN ହେଇଥାଏ V__VAUX |

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 74: Tdil Mal Tags

74

CopyrightTDIL

16 REFERENCE 1 ISO 126201999 Terminology and other language and content resources mdash

Specification of data categories and management of a Data Category Registry for language resources

2 XML Schema Requirements httpwwww3orgTR1999NOTE-xml-schema-req-19990215

3 Best Practices for XML Internationalization httpwwww3orgTRxml-i18n-bp 4 Internationalization Tag Set (ITS) Version 10 httpwwww3orgTR2007REC-its-

20070403

5 ISO 639-3 Language Codes httpwwwsilorgiso639-3codesasp

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 75: Tdil Mal Tags

75

CopyrightTDIL

ANNEXURE-1

LANGUAGE TAGS

SNo Language Name Language Tags according to ISO 639-3

1 Hindi asm 2 Assamese ben 3 Bangla brx 4 Bodo doi 5 Dogri guj 6 Gujarati hin 7 Kannada kan 8 Kashmiri kas 9 Konkani kok 10 Maithili mai 11 Malayalam mal 12 Manupuri mni 13 Marathi mar 14 Nepali nep 15 Oriya ori 16 Punjabi pan 17 Sanskrit san 18 Santhali sat 19 Sindhi snd 20 Tamil tam 21 Telugu tel 22 Urdu urd

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi

Page 76: Tdil Mal Tags

76

CopyrightTDIL

CONTRIBUTERS

1 Ms Swaran Lata Department of Information Technology New Delhi 2 Prof Girish Nath Jha JNU New Delhi 3 Dr Somnath Chandra Department of Information Technology New Delhi 4 Dipti Misra Sharma LTRC IIIT-H 5 Somi Ram CDAC NOIDA 6 Prof Uma Maheswara Rao G University of Hyderabad 7 Dr Sobha L AU-KBC Chennai 8 Menak S 9 Kalika Bali Microsoft Bangalore 10 Prof Pushpak Bhattacharyya IIT-Bombay 11 Prof Malhar Kulkarni IIT-Bombay 12 Lata Popale IIT-Bombay 13 Kirtida Shah Gujarati University Ahemadabad 14 Mona Parakh LDCIL Mysore 15 Jyoti Pawar Goa University 16 Madhavi Sardesai Goa University 17 Ramnath 18 Aadil Kak University of Kashmir 19 Nazima University of Kashmir 20 Dr Richa LDCIL Mysore 21 Mazhar Mehdi Hussain JNU New Delhi 22 Mr Prashant Verma W3C India New Delhi 23 Swati Arora W3C India New Delhi