icon09 nlp tools contest: indian language...

Proceedings of

ICON09 NLP TOOLS CONTEST: INDIAN LANGUAGE DEPENDENCY PARSING

Hyderabad, India

1st August – 1st December 2009

Acknowledgment

Many people helped at different fronts to make the contest a success. I'd like to thank the annotators, Ms. Rafiya Begum for Hindi, Dr. Soma Paul for Bangla, Mr. Viswanath Naidu, Mr. Phani Gadde, Mr. Meher Vijay and Mr. Bharat Ambati for Telugu. I'd like to thank Mr. Bharat Ambati and Mr. Phani Gadde for their unceasing effort in completing various tasks. Mr. Abhijeet Gupta provided all the web support. Prof. Joakim Nivre gave timely feedbacks for improvements. Discussions with Mr. Prashanth Mannem, Dr. Dipti Misra Sharma, and Prof. Rajeev Sangal were instrumental in taking various decisions. Finally, thanks to all the participants for their wonderful effort.

Samar Husain (Chair, ICON09 tools contest)

Table of Contents

Dependency Parsers for Indian Languages. Samar Husain…………………………………………………………….…………………………………………………………………………….4 Dependency Parser for Bengali: the JU System at ICON 2009. Aniruddha Ghosh, Pinaki Bhaskar, Amitava Das, and Sivaji Bandyopadhyay…………………………………………….7

Parsing Indian Languages with MaltParser. Joakim Nivre………………………………………………………………………………………………………………………………………….12

Maximum Spanning Malt: Hiring World's Leading Dependency Parsers to Plant Indian Trees. Daniel Zeman…………………………………………………………………………………………………………………………………………19

Structure Simplification and Demand Satisfaction Approach to Dependency Parsing in Bangla. Sankar De, Arnab Dhar, and Utpal Garain………………………………………………………………………………………………25

Experiments in Indian Language Dependency Parsing. Bharat Ram Ambati, Phani Gadde and Karan Jindal……………………………………………………………………………….32

Grammar Driven Rules for Hybrid Bengali Dependency Parsing. Sanjay Chatterji, Praveen Sonare, Sudheshna Sarkar and Debashree Roy………………………………………………38

Constraint based Hindi dependency parsing. Meher Vijay Yeleti and Kalyan Deepak…………………………………………………………………………………………………..43

Bidirectional Dependency Parser for Hindi, Telugu and Bangla. Prashanth Mannem……………………………………………………………………………………………………………………………….49

Dependency Parsers for Indian Languages

Samar Husain Language Technologies Research Centre

IIIT-Hyderabad, India. [email protected]

Due to the availability of annotated corpora for various languages since the past decade, data driven parsing has proved to be immensely successful. Unlike English, however, most of the parsers for morphologically rich free word order (MoR-FWO) languages (such as Czech, Turkish, Hindi, etc.) have adopted the dependency grammatical framework. It is well known that for MoR-FWO languages, dependency framework provides ease of linguistic analysis and is much better suited to account for their various structures (Shieber, 1975; Mel'Cuk, 1988; Bharati et al., 1995). The NLP tools contests are regular events held as part of the International Conference on Natural Language Processing (ICON) and cater to various NLP tasks. This year, the contest focused on Indian language dependency parsing. Three languages, namely Telugu, Bangla and Hindi were explored. 8 teams participated in the event. Manually annotated data in all the three languages was given to the participating teams. The data was POS tagged, chunked and marked for dependency information. The dependency relations are based on the Paninian grammatical model (Bharati et al., 1995; Begum et al., 2008). The data also contained automatically computed morphological, head information etc. General statistics for the released training data is shown in table 1.

Language No. of Sentences Word Count Average sentence length Telugu 1,400 7602 5.43 words Bangla 980 10305 10.52 words Hindi 1,500 28522 19.01 words

Table 1. The development and the testing set for all the three languages had 150 sentences each. The released annotated data, although small, can provide considerable insight into various parsing issues. The contest is intended as a first step in this direction. As the results suggest, fairly high accuracies have been obtained in spite of the small data size. The teams submitted the results in 2 rounds. In the first round the dependency tagset was fine-grained and was larger than the tagset used in the 2nd round. For both the rounds the data used was identical. The performance of the system was measured in terms of standard measures such as Unlabelled attachment score (UAS) and Labelled attachment score (LAS) (Nivre et al., 2007b). The average scores over the 2 rounds for different languages are mentioned in table 2.

Languages UAS LAS LS Telugu 84.78 59.73 61.41Bangla 81.96 66.97 71.1 Hindi 88.78 72.83 75.59

Table 2. Average scores

4

Table 3 shows the results of the best performing system in the three languages.

Languages (Team) UAS LAS LS Telugu (Mannem) 85.76 65.01 66.21 Bangla (De et al.) 90.32 84.29 85.85

Hindi (Ambati et al.) 90.22 79.33 81.66 Table 3. Best results on course grained tagset

Many methods were explored. Ghosh et al. (2009) use a CRF based hybrid method. Nivre (2009), Ambati et al. (2009) and Chatterji et al. (2009) used the well known transition based dependency parsing (Nivre et al., 2007a). Zeman (2009) combines various well known dependency parsers forming a superparser by using a voting method. De et al. (2009), Yeleti and Deepak (2009) use a constraint based approach. Mannem (2009) tried bi-directional incremental parsing and perceptron learning to attack the problem. Teams that submitted results on all the three languages were considered for ranking. De et al. (2009) got best results for Bangla; however since they participated only for Bangla they have not been ranked. Consolidated result of all the systems for both the rounds can be seen in Figure 1.

Figure 1. Consolidated results

References

B. R. Ambati, P. Gadde and K. Jindal. 2009. Experiments in Indian Language Dependency Parsing. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

5

R. Begum, S. Husain, A. Dhwaj, D. M. Sharma, L. Bai, and R. Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of IJCNLP-2008. http://www.iiit.net/techreports/2007_78.pdf

A. Bharati, V. Chaitanya and R. Sangal. 1995. Natural Language Processing: A Paninian Perspective, Prentice-Hall of India, New Delhi, pp. 65-106. ltrc.iiit.ac.in/downloads/nlpbook/nlp-panini.pdf

S. Chatterji, P. Sonare, S. Sarkar and D. Roy. 2009. Grammar Driven Rules for Hybrid Bengali Dependency Parsing. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

S. De, A. Dhar, and U. Garain. 2009. Structure Simplification and Demand Satisfaction Approach to Dependency Parsing in Bangla. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

A. Ghosh, P. Bhaskar, A. Das, and S. Bandyopadhyay. 2009. Dependency Parser for Bengali: the JU System at ICON 2009. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

P. Mannem. 2009. Bidirectional Dependency Parser for Hindi, Telugu and Bangla. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

I. A. Mel'Cuk. 1988. Dependency Syntax: Theory and Practice, State University Press of New York.

J. Nivre. 2009. Parsing Indian Languages with MaltParser. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007a. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95-135.

J. Nivre and J. Hall and S. Kubler and R. McDonald and J. Nilsson and S. Riedel and D. Yuret. 2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007.

S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In Linguistics and Philosophy, p. 8, 334–343.

M. V. Yeleti and K. Deepak. 2009. Constraint based Hindi dependency parsing. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

D. Zeman. 2009. Maximum Spanning Malt: Hiring World's Leading Dependency Parsers to Plant Indian Trees. In Proceedings of ICON09 NLP Tools Contest: Indian Language Dependency Parsing. Hyderabad, India. 2009.

6

Dependency Parser for Bengali: the JU System at ICON 2009

Aniruddha Ghosh†

Pinaki Bhaskar* Amitava Das

+ Sivaji Bandyopadhyay

¥

Department of Computer Science and Engineering

Jadavpur University

Jadavpur, Kolkata 700032, India

[email protected]† [email protected]

* [email protected]

+ [email protected]

¥

Abstract

This paper reports about our work in the

ICON 2009 NLP TOOLS CONTEST:

Parsing. We submitted two runs for Ben-

gali. A statistical CRF based model fol-

lowed by a rule-based post-processing

technique has been used. The system has

been trained on the NLP TOOLS CON-

TEST: ICON 2009 datasets. The system

demonstrated an unlabeled attachment

score (UAS) of 74.09%, labeled attach-

ment score (LAS) of 53.90% and labeled

accuracy score (LS) of 61.71% respec-

tively.

1 Introduction

Bengali language is characterized by a rich sys-

tem of inflections (VIBHAKTI), derivation, and

compound formation (Saha et al., 2004; Dash,

1994; Chakroborty, 2003) and karakas, which is

why analysis and generation involving Bengali

is a very challenging task. Parsing natural lan-

guage sentences pose considerable difficulties

due to the ambiguous nature of natural languages.

Therefore, developing a computational grammar

for a natural language is a complicated endeavor.

Most of the previous research attempts in pars-

ing of Bengali were based on the detection and

formation of the proper rule set to identify cha-

racteristics of inter-chunk relations (Sengupta, P.

and B. B. Chaudhuri. 1993; Saha, 2005). Depen-

dency based methods for syntactic parsing has

become quite popular in processing of natural

languages in recent years. The dependency pars-

er developed for Bengali and presented in this

report is statistical in nature and a conditional

random field (CRF) model has been used for the

same. A separate rule based dependency parser

for Bengali has also been developed. The output

of the baseline CRF based system is filtered by a

rule-based post-processing module using the

output obtained through the rule based depen-

dency parser.

2 The Dependency parsing system

A parser can be defined as a language model

over a word lattice in order to determine what

sequence of words running along a path through

the lattice has highest probability. Capturing the

tree structure of a particular sentence has been

seen as key to the goal of disambiguation. One

way to capture the regularity of chunks over dif-

ferent sentences is to learn a grammar that ex-

plains the structure of the chunks one finds. The

technique presented in this paper uses both statis-

tical and rule based models to disambiguate be-

tween dependency relations. Statistical model

works well to capture language model. Gram-

matical rules are manually developed. The cru-

cial features for detecting dependency relations

are identified and used in the training of the CRF

system. The post-processing rules are manually

devised by analyzing the training set.

2.1 The Statistical Parser

The probabilistic sequence models, which allow

integrating uncertainty over multiple, interde-

pendent classifications and collectively deter-

mine the most likely global assignment, may be

used in a parser. A standard model, Conditional

Random Field (CRF), has been used.

2.2 Conditional Random Fields (CRF)

CRFs are undirected graphical models which

define a conditional distribution over a label se-

quence given an observation sequence. We de-

7

fine CRFs as conditional probability distributions

P(Y |X) of target language words given source

language words. The probability of a particular

target language word Y given source language

word X is the normalized product of potential

functions each of the form

Where tj(Yi−1, Yi,X, i) is a transition feature func-

tion of the entire source language word and the

target language characters (n-gram) at positions i and i−1 in the target language word; sk(Yi,X, i) is

a state feature function of the target language

word at position i and the source language word;

and λj and µk are parameters to be estimated from

training data.

where each fj(Yi−1, Yi,X, i) is either a state func-

tion s(Yi−1, Yi,X, i) or a transition function

t(Yi−1, Yi,X, i). This allows the probability of a

target language word Y given a source language

word X to be written as

Z(X) is a normalization factor.

The parameters of the CRF are usually estimated

from a fully observed training data {(x(k), y(k))}. The product of the above equation over all train-

ing words, as a function of the parameters λ, is

known as the likelihood, denoted by

p({y(k)}|{x(k)}, λ). Maximum likelihood training

chooses parameter values such that the logarithm

of the likelihood, known as the log-likelihood, is

maximized. For a CRF, the log-likelihood is giv-

en by

From the input file in the SSF format, all the

morphological information like root word, chunk

labels, POS tags, vibhakti, case markers are re-

trieved. The dependency relation of a chunk in a

sentence depends on the morphological informa-

tion along with the position in the sentences and

also the surrounding chunk labels. Therefore the

CRF statistical tool calculates the probability of

the morphological information along with the

dependency relation of the previous 3 and next 3

chunks. In this parser, quad-gram technique is

used as most of the sentences have around 10

chunks. It can calculate the dependency among

the rest of the dependency tags that has not been

marked in that sentence to resolve the ambiguity

because occurrence of same dependency tag in a

sentence has a low probability. The system

works fine for simple sentences. As many phras-

al structures are possible in complex and com-

pound sentences, CRF system is prone to make

mistakes. Thus, during post processing, clauses

present in such sentences are identified using the

conjunctive words (ex. “ebam”) or subordinate

words (ex. “se”, “ye”). Such post processing

rules are based on linguistic knowledge.

The input file in the SSF format includes the

POS tags, Chunk labels and morphology infor-

mation. The chunk information in the input files

are converted to B-I-E format so that the begin

(B) / inside (I) / End (E) information for a chunk

are associated as a feature with the appropriate

words. The chunk tags in the B-I-E format of the

chunk with which a particular chunk is related

through a dependency relation are identified

from the training file and noted as an input fea-

ture in the CRF based system. The corresponding

relation name is also another input feature asso-

ciated with the particular chunk. Each sentence is

represented as a feature vector for the CRF ma-

chine learning task. After a series of experiments

the following feature set is found to be perform-

ing well as a dependency clue. The input features

associated with each word in the training set are

the root word, pos tag, chunk tag, vibhakti and

dependent chunk tag and dependency relation.

Root Word: Some dependency relations are dif-

ficult to identify without the word itself. It is bet-

ter to come with some example. AjakAla NN NP X k7t

In the example there is no clue except the word

itself. The word itself is noun, the chunk level

denotes a noun phrase and there is no vibhakti

attached to the word. For these cases, word lists

of temporal words, locations names and person

names have been used for disambiguation (Ekbal

et. al., 2008). Specifically identification of k7t

relation is very tough because the word itself will

be a common noun or a proper noun but the in-

formation of whether the word denotes a time or

a location helps in the disambiguation.

Part of Speech: Part of speech of a word always

plays a crucial role to identify dependency rela-

tion. For example dependency relations like k1

and k2 in most of the cases involve a noun. It has

been observed through experiments that not only

POS tag of present word but POS tags of the

context words (previous and next) are useful in

8

identifying the dependency relation in which a

word takes part.

Chunk label: Chunk label is the smallest ac-

countable unit for detection dependency relations

and it is an important feature. But sentences are

parsed into word level during training, hence

chunk labels are associated to the appropriate

words with the labels as B-X (beginning), I-X

(Intermediate) and E-X (End) (where X is the

chunk label).

Vibhakti: Indian languages are mostly non-

configurational and highly inflectional. Gram-

matical functions (GFs) are predicted by case

inflections (markers) on the head nouns of noun

phrases (NPs) and postpositional particles in

postpositional phrases (PPs). In the following

example the ‘0_janya’ vibhakti inflection of the

word “pAoyZAra” leads to rh (Hetu) case inflec-

tions. However, in many cases the mapping from

case marker to GF is not one-to-one. Some ex-

ample words and the corresponding vibhaktis are

shown in Table 1. The classical technique for

non-configurational syntactic encoding of GFs

(Bresnan, 1982) therefore requires a number of

alternations to be thrown in to handle this phe-

nomenon.

Word Vibhakti

bAMlAyZa Null

ema e Null

padZawe A_As+Ce

ekatA digrI 0

pAoyZAra. 0_janya

Table 1: Words and associated Vibhakti

The output of this statistical parsing/mapping

process is a dependency relation along with the

root word, vibhakti, chunk tag and their corres-

ponding dependency relation with the chunk

heads. During the development process confu-

sion matrix helped to detect errors. The most

prominent errors on the development set when

vmod was wrongly identified are shown in Table

2.

Vmod

k1 42

k2 37

k7p 14

Ccof 41

Table 2: Confusion Matrix on Development Set

2.3 Post-Processing

With the help of confusion matrix, we have de-

vised a set of parsing rules depending on the na-

ture of errors to disambiguate when more than

one dependency relation has been identified in a

certain situation.

A rule based dependency parser that uses lin-

guistic knowledge has been developed to check

the output of the CRF based dependency parser.

Depending on specific attributes of the chunk

like vibhakti/case markers and/or word informa-

tion, the rule based system derives the dependen-

cy relation of the chunk. For each dependency

relation tag depending on specific linguistic fea-

tures, syntactic cues are derived to identify the

dependency relations. Some example rules used

are described below:

1. A NP chunk with 0 vibhakti and NNP or

PRP postag will be marked with k1 rela-

tion with the nearest verb chunk.

2. A chunk with “era” vibhakti will be

marked with ‘r6’ relation with next noun

chunk.

3. A NP chunk with 0 vibhakti and NN

postag will be marked with k2 relation

with the nearest verb chunk.

4. In co-ordinate type sentences, the verb

chunk will be marked with ‘ccof’ rela-

tion with the nearest CCP chunk. If CCP

chunk is surrounded by two NP chunks

then both the NP chunks will be marked

as ‘ccof’ with the CCP.

5. In sub-ordinate type sentences, the verb

chunk of sub-ordinate clause will be

marked with “nmod__relc” with that

chunk modified by the main clause.

6. If a chunk marked with “0_weke”, k5 re-

lation will be identified.

7. If a chunk marked with “0_prawi”, ‘rd’

relation will be indentified. The relations

‘k5’ and ‘rd’ are pre-dependent, i.e., a

dependent that precedes its head.

8. Some verbs like “kar” or “ha” etc, ex-

pects arguments. For these verbs the pre-

vious chunk will be expected argument

of the verb chunk. The previous chunk

will be identified as “pof” relation with

the following verb. For example:

(( Suru NN ))_NP (( hayZe VM

| SYM ))_VGF

9

The “ha” verb expects argument. Here “Suru” is

the expected argument. So the NP chunk is iden-

tified as “pof” relation with the verb chunk.

(( apekRA NN ))_NP (( karawe

VM ))_VGF

The NP chunk preceding the “kar” verb is

marked with “pof” relation with the VGF.

The ambiguity comes when for a certain vib-

hakti, multiple possible relations are identified.

For example, for a chunk with “0” vibhakti two

possible output dependency relations are ‘k1’ &

‘k2’. This ambiguity is resolved using pos tag

information. If pos tag is ‘NNP’ then dependency

relation will be ‘k1’ and if ‘NN’ then it will be

‘k2’. If ambiguity is not resolved with this rule

then position of the chunk in the sentence is con-

sidered. If there are two chunks with “0” vibhak-

ti, the distant chunk from the verb chunk will be

marked with ‘k1’ relation and nearer ones will be

marked with ‘k2’ relation.

Sometimes to resolve the ambiguity, system

checks the categorization of the root word to

identify the relation. For example:

The vibhakti “me” identifies multiple relations

e.g., k7, k7p, k7t, k3. This ambiguity can be re-

solved by examining the root word category and

case marker. If it is a time related word (e.g. kAl,

waKana etc.) then relation will be k7t. If it is a

location word then it will be k7p. In root word, if

the case marker is ‘d’ then the relation will be

k3, otherwise k7 will be marked.

The outputs of the CRF based and the rule

based dependency parsers are matched. The rule

based system is given the higher priority as it is

based on syntactic-semantic cues. If there is any

mismatch between the two results then output of

rule based system will be taken.

3 Error Analysis

During the development stage of the system we

had studied the various chunk labeling errors

committed by the system.

(( wAra PRP ))_NP (( Ay-

Zawana NN ))_NP (( o CC

))_CCP (( parimANa NN

))_NP (( xeKe VM ))_VGNF

(( buJawe VM ))_VGNF ((

asubiXA NN ))_NP (( hay-

Za VM ))_VGF (( ei DEM

paWa NN ))_NP (( hAwi

NN ))_NP (( geCe VM à¥¤

SYM ))_VGF

In the above example, it is very much ambiguous

to derive the syntactic cues for the case inflection

of the NP with the word “ei paWa” depending on

the syntactic and morphological information. In

dependency parsing, morphological information

plays a crucial role to identify case inflection.

Deficiency of morphological information leads

to ambiguous identification. As an example:

(( NP

<af=kOwUhala,unk,,,,,, >

SYM

<af=",punc,,,,,,>

anAbaSyaka JJ

<af=anAbaSyaka,adj,,,,,,>

kOwUhala NN

<af=kOwUhala,unk,,,,,,>

))

In the above example, using morphological in-

formation no syntactic cues could be formed.

Even other linguistic and syntactic (morph-

syntactic) information like POS tag, Chunk tag,

root word will not lead to an unambiguous de-

pendency relation.

3.1 Experimental results

We have trained the probabilistic sequence

model with the morphological features like root

word, POS-tag, chunk tag, vibhakti, dependency

relation from the training set data. A brief

statistics of the datasets are presented below. In

the training set, out of 980 sentences, 223

sentences are simple sentences, 757 compound

sentences. Among the compound sentences 198

are co-ordinate sentences and 559 are sub-

ordinate sentences. The accuracy of the CRF

based model on the development set is 65%. Depending upon the nature of errors involved in

the results, we have devised the template with

new rules. The use of the new template in the

CRF tool increased the accuracy up to 69%.

With this result of CRF based model, rule

based model’s output is merged. The rule based

model output has a higher priority as it is based

on syntactico-semantic attributes of the chunk.

This merging with rule based system increases

the accuracy to 74%. The accuracy figures of the

present Bengali parser on the test set are UAS

(Unlabelled Accuracy Score) is 74.09%, LAS

(Labeled Accuracy Score) is 53.90% and LS

(Labeled Score) is 61.71% respectively.

10

4 Conclusion

This paper reports about our works as part of the

NLP Tool Contest at ICON 2009. We have used

the statistical CRF based model along with rule

based post-processing. During the CRF based

run, we have obtained the accuracy of 69% in the

second run. We have used a rule-based post-

processing and produce a better accuracy value

of 74%.

A properly designed NLP platform for Indian

languages must come with an efficient morpho-

syntactic unit for parsing words into their consti-

tuent morphemes where lexical projections of

words can be obtained from projections of indi-

vidual morphemes. Phrasal order could be vary

depending on the corpus. In future task our aim

is to develop a more proficient statistical system,

can produce more than one possible parsed out-

put. A more concise rule set should be generated

with morpho-syntactic and lexico-syntactic var-

iations.

References

Asif Ekbal, Rejwanul Haque, Amitava Das, Venka-

teswarlu Poka and Sivaji Bandyopadhyay. 2008.

Language Independent Named Entity Recognition

in Indian Languages. In Proceedings of the

IJCNLP-08 Workshop on Named Entity Recogni-

tion for South and South East Asian Languages.

pp. 33-40.

Bresnan, J. 1982. Control and Complementation. In J.

Bresnan, editors, The Mental Representation of

Grammatical Relations. MIT Press, Cambridge,

MA, pp. 282-390.

Chakroborty, B. 2003. Uchchotoro Bangla Byakaron.

AkshayMalancha.

Dash, K.C.(Ed.). 1994. Indian Semantics. Agamakala

Publications,Delhi.

Saha, G.K., Saha, A.B., Debnath, S. 2004. Computer

Assisted Bangla POS Tagging. iSTRAN, Tata

McGraw-Hill, NewDelhi.

Saha, Goutam Kumar. 2005. English to Bangla

Translator: The BANGANUBAD. International

Journal-CPOL, Vol. 18 (4), pp.281-290, USA.

Sengupta, P. and B. B. Chaudhuri. 1993. A Morpho-

Syntactic Analysis Based Lexical Sub-System. In-

ternational Journal of Pattern Recognition and Ar-

tificial Intelligence. pp595--619.

11

Parsing Indian Languages with MaltParser

Joakim NivreUppsala University

Department of Linguistics and PhilologyE-mail: [email protected]

Abstract

This paper describes the application ofMaltParser, a transition-based dependencyparser, to three Indian languages – Bangla,Hindi and Telugu – in the context ofthe NLP Tools Contest at ICON 2009.In the final evaluation, MaltParser wasranked second among the participatingsystems and achieved an unlabeled attach-ment score close to 90% for Bangla andHindi, and over 85% for Telugu, whilethe labeled attachment score was 15–25percentage points lower. It is likely thatthe high unlabeled accuracy is achievedthanks to a relatively low syntactic com-plexity in the data sets, while the lowlabeled accuracy is due to the limitedamounts of training data.

1 Introduction

The NLP Tools Contest at ICON 2009 consistedin training and evaluating dependency parsers forthree Indian languages: Bangla, Hindi and Tel-ugu. I participated using the freely available Malt-Parser system (Nivre et al., 2006a), which imple-ments a transition-based approach to dependencyparsing, and which has previously been applied toover twenty languages (Nivre et al., 2006b; Nivreet al., 2007a; Nivre et al., 2007b; Nivre et al.,2008). In this paper, I give a brief introductionto MaltParser (Section 2), describe the process ofoptimizing the system for the three different lan-guages (Section 3), report the experimental resultsfrom the final evaluation (Section 4), and concludewith some ideas for future work (Section 5).

2 MaltParser

MaltParser (Nivre et al., 2006a) implements thetransition-based approach to dependency parsing,which has two essential components:

• A transition system for mapping sentences todependency trees

• A classifier for predicting the next transitionfor every possible system configuration

Given these two components, dependency parsingcan be realized as deterministic search through thetransition system, guided by the classifier. Withthis technique, parsing can be performed in lineartime for projective dependency trees and quadratictime for arbitrary (possibly non-projective) trees(Nivre, 2008).

2.1 Transition Systems

MaltParser comes with a number of built-in tran-sition systems, but we limit our attention to thetwo systems that have been used in the parsingexperiments: the arc-eager projective system firstdescribed in Nivre (2003) and the non-projectivetransition system based on the method describedby Covington (2001). For a more detailed analysisof this and other transition systems for dependencyparsing, see Nivre (2008).

A configuration in the arc-eager projective sys-tem contains a stack holding partially processedtokens, an input buffer containing the remainingtokens, and a set of arcs representing the partiallybuilt dependency tree. There are four possibletransitions (where top is the token on top of thestack and next is the next token in the input buffer):

• LEFT-ARC(r): Add an arc labeled r fromnext to top; pop the stack.

12

FORM LEMMA CPOS POS FEATS DEPREL

Stack: top 1 2 3 4 5 6Stack: top−1 7Input: next 8 9 10 11 12 13Input: next+1 14 15 16 17Input: next+2 18Input: next+3 19Tree: head of top 20 21Tree: leftmost dep of top 22 23Tree: rightmost dep of top 24 25 26Tree: leftmost dep of next 27 28 29String: predecessor of top 30 31String: successor of top 32String: predecessor of next 33String: successor of next 34Second stack: top word 35Second stack: bottom word 36

Table 1: Feature pool for optimization with columns representing data fields and rows representing wordsdefined relative to the stack, input buffer, partially built tree, input string, and second temporary stack.Features with boldface numbers belong to the baseline model; features 13, 35 and 36 are relevant onlyfor the non-projective transition system.

• RIGHT-ARC(r): Add an arc labeled r fromtop to next; push next onto the stack.

• REDUCE: Pop the stack.

• SHIFT: Push next onto the stack.

Although this system can only derive projectivedependency trees, the fact that the trees are labeledallows non-projective dependencies to be capturedusing the pseudo-projective parsing technique pro-posed in Nivre and Nilsson (2005).

The non-projective system uses a similar type ofconfiguration but adds a second temporary stack.There are again four possible transitions:

• LEFT-ARC(r): Add an arc labeled r fromnext to top; push top onto the second stack.

• RIGHT-ARC(r): Add an arc labeled r fromtop to next; push top onto the second stack.

• NO-ARC: Push top onto the second stack.

• SHIFT: Empty the second stack by pushingevery word back onto the stack; then pushnext onto the stack.

Unlike the first system, this allows the derivationof arbitrary non-projective dependency trees.

2.2 Classifiers

Classifiers can be induced from treebank data us-ing a wide variety of different machine learningmethods, but all experiments reported below usesupport vector machines with a polynomial kernel,as implemented in the LIBSVM package (Changand Lin, 2001) included in MaltParser. The taskof the classifier is to map a high-dimensional fea-ture vector representation of a parser configurationto the optimal transition out of that configuration.

The features used in our system all represent at-tributes of tokens and have been extracted fromthe following fields of the CoNLL data repre-sentation (Buchholz and Marsi, 2006): FORM,LEMMA, CPOSTAG, POSTAG, FEATS, and DEP-REL. The pool of features used in the experimentsare shown in Table 1, where rows denote tokensin a parser configuration – defined relative to thestack, the input buffer, the partially built depen-dency tree, the input string, and the second tempo-rary stack – and columns correspond to data fields.Each non-empty cell represents a feature, and fea-tures are numbered for future reference. Featureswith boldface numbers are present in the baselinemodel, while the other features have been exploredin feature selection experiments. Note that fea-tures 15, 34 and 35 are relevant only for the non-projective transition system. (Features 34 and 35

13

refer to the second temporary stack, and feature 15refers to the dependency label of next, which willalways be undefined in the projective system.)

3 Optimization

Although MaltParser is a language-independentsystem in the sense that it can be trained on datafrom any language, it is usually possible to im-prove parsing accuracy by optimizing parametersof the transition system and the classifier. In thissection, we describe the different stages of opti-mization performed for Bangla, Hindi and Telugu.

3.1 Data and Methodology

For all three languages, data were provided inthe form of one training set and one validationset, which I merged into a single training set andused for ten-fold cross-validation using a pseudo-randomized split. This resulted in a training setof 1130 sentences (7260 tokens) for Bangla, 1651sentences (15029 tokens) for Hindi, and 1615 sen-tences (6207 tokens) for Telugu. It is worth notingthat the average sentence length in all data sets issmall, with 6.4 tokens for Bangla, 9.1 for Hindi,and 3.8 for Telugu.

The baseline model for all three languages wasMaltParser trained with default settings, includingthe default feature model illustrated in Table 1.In order to incorporate a change to the model, Irequired an improvement in both labeled attach-ment score (LAS) and unlabeled attachment score(UAS).

The development period was divided into twophases, with a preliminary evaluation after the firstphase. After this evaluation, new versions of thedata sets were distributed for all three languages,where the number of distinct dependency labelshad been reduced to twelve coarse-grained labelsthat were the same across all three languages (al-though all labels did not occur in all languages).In the final evaluation, participants had to submitresults both for the original fine-grained labels andfor the new coarse-grained labels.

It would have been possible to optimize Malt-Parser separately for the two levels of granular-ity, but I decided to use only the coarse-grainedlabels during the second development phase, start-ing from the models that had been optimized forthe fine-grained labels during the first phase. Allresults in the final evaluation are therefore basedon models optimized for the coarse-grained labels,

which in fact led to a small drop in performancefor Bangla and Hindi with fine-grained labels, ascompared to the preliminary evaluation.

3.2 Transition SystemsI evaluated three different transition systems. Inaddition to the arc-eager projective system and thenon-projective system described in Section 2.1, Iused the arc-standard projective system describedin Nivre (2008). I found that the non-projectivesystem gave the highest accuracy for Bangla andTelugu, while the arc-eager projective systemworked best for Hindi. I also tried combiningthe two projective systems with pseudo-projectiveparsing (Nivre and Nilsson, 2005), which gave amarginal improvement with the arc-eager systemfor Bangla – but results were still inferior to thoseobtained with the non-projective system – and noimprovement for any of the other combinations.Based on these experiments, I decided to use thenon-projective system for Bangla and Telugu andthe strictly projective arc-eager system for Hindiin all the following experiments.

3.3 ClassifiersClassifier optimization essentially consists in opti-mizing two different aspects:

• Feature model

• Learning algorithm parameters

Ideally, the feature model and the parameters ofthe learning algorithm should be optimized jointly,but in practice we are forced to use some kindof sequential or interleaved optimization. In thiscase, I started by optimizing the feature modeland the learning parameters for the fine-graineddependency labels used during the first develop-ment phase. During the second phase, I performedadditional feature selection experiments with thecoarse-grained labels but without changing anyparameters of the learning algorithm.

3.3.1 Feature ModelsIn order to tune the feature models, I first per-formed backward feature selection to find outwhether any of the features in the default modelcould be omitted. This resulted in the exclusionof feature number 23 for all three languages: thedependency label of the leftmost dependent of top.This is a feature that is useful for some languagesbut apparently not for Bangla, Hindi and Telugu.

14

I then performed several rounds of forward fea-ture selection, first with the fine-grained data setsand later with the coarse-grained data sets, whichresulted in the following addition of features:

• Features 2 and 9, the lemma of top and next,respectively, were added for all three lan-guages. For Bangla, feature 15, the lemmaof next+1, was added as well.

• Features 3 and 10, the coarse-grained partof speech of top and next, respectively, wereadded for all three languages.

• Features 5 and 12, the morphological fea-tures associated with top and next, respec-tively, were added for all three languages. ForBangla, feature 17, the morphological fea-tures of next+1, was added as well.

• Additional part-of-speech features wereadded for Bangla (22, 25) and Hindi (21, 25,28, 31).

• Additional lexical (word form) features wereadded for Bangla (24), Hindi (27, 30) andTelugu (24, 27).

• Conjoined features were added for Bangla(1&4, 4&11, 8&11) and Hindi (1&4, 1&8,4&11).1

The greatest improvement for all languages camefrom the addition of features 5 and 14, that is,the morphological features associated with top andnext, which improved the LAS by almost 10 per-centage points for Hindi.

3.3.2 Learning Algorithm ParametersThe learning parameter optimization was done intwo steps. The first was to check whether accuracycould be improved by splitting the classificationproblem into two steps, first choosing the basictransition and then (in the case of LEFT-ARC andRIGHT-ARC) choosing the arc label. This turnedout to improve accuracy for all three languages.There was a small additional improvement fromhaving separate label classifiers for the LEFT-ARC

and RIGHT-ARC case.1It should be noted that the use of support vector machines

with a degree-2 polynomial kernel implies that all explicitfeatures are conjoined implicitly by the kernel. This includesthe explicitly conjoined features, which are thus combinedwith both simple and conjoined features.

The second step was to optimize the SVM pa-rameters, in particular the cost parameter C, whichcontrols the tradeoff between minimizing trainingerror and maximizing margin. This parameter wasset to 0.75 for Bangla, 1.0 for Hindi, and 0.875 forTelugu.

3.4 Final Models

Table 2 shows the cross-validation results for thefinal optimized models on the data with fine-grained dependency labels, compared to the base-line model. For all three languages, we find thelargest improvement in LAS, with 7.4 percent-age points for Bangla, 12.9 for Hindi, and 5.0 forTelugu, while the corresponding improvement inUAS is only 3.2 percentage points for Bangla, 7.0for Hindi, and 1.9 for Telugu.

Table 3 shows the corresponding results withcoarse-grained labels. The overall pattern is verysimilar, but with slightly larger improvements, inparticular for Bangla and Hindi (LAS: 10.6, 14.3,5.3; UAS: 6.3, 9.1, 2.3), and the labeled attach-ment scores of the optimized models are generallyabout 4 percentage points higher than for the fine-grained label set.

4 Evaluation

Tables 2 and 3 also show the evaluation results onthe final test sets. For Bangla and Hindi, the re-sults are slightly better than the cross-validationresults, especially with coarse-grained labels (over2 percentage points for Bangla and over 4 percent-age points for Hindi), but for Telugu they are infact lower than the baseline results. On the otherhand, all test sets are small, consisting only of 150sentences, so it is natural to expect considerablevariance in the results.

Of the systems that submitted test results for alllanguages, MaltParser was ranked second with re-spect to the average score over all languages (forLAS, UAS and LA with both fine-grained andcoarse-grained labels), but for both Bangla andTelugu there were in fact two other systems thatachieved better results. The closest margin to thebest performing system was for Hindi, where theMaltParser results were about 1 percentage pointbelow the best results with both fine-grained andcoarse-grained labels. It may be significant thatHindi had by far the largest training set, more thantwice the size of the two others.

For all three languages, there is a wide gap be-

15

Cross-ValidationBaseline Optimized Final Test

Language LAS UAS LA LAS UAS LA LAS UAS LABangla 61.7 83.5 63.8 69.7 86.5 72.6 70.5 88.7 72.9Hindi 56.5 78.9 58.7 69.6 86.0 72.2 73.4 89.8 75.3Telugu 62.9 87.4 65.8 67.9 89.3 70.9 57.6 84.7 58.5

Table 2: Parsing accuracy for Bangla, Hindi and Telugu with fine-grained dependency labels; cross-validation results with baseline and optimized models on training sets (including development sets); testset results for optimized models trained on entire training set. LAS = labeled attachment score; UAS =unlabeled attachment score; LA = label accuracy.

Cross-ValidationBaseline Optimized Final Test

Language LAS UAS LA LAS UAS LA LAS UAS LABangla 63.2 80.3 66.5 73.8 86.6 76.7 76.1 89.0 79.6Hindi 59.8 79.1 62.3 74.1 86.2 77.2 78.2 89.4 81.1Telugu 66.8 86.7 70.4 72.1 89.0 75.7 62.4 86.3 63.0

Table 3: Parsing accuracy for Bangla, Hindi and Telugu with coarse-grained dependency labels; cross-validation results with baseline and optimized models on training sets (including development sets); testset results for optimized models trained on entire training set. LAS = labeled attachment score; UAS =unlabeled attachment score; LA = label accuracy.

tween the labeled and unlabeled attachment score.For Telugu with fine-grained labels (worst case) itis over 25 percentage points, but even for Hindiwith coarse-grained labels (best case) it is morethan 10 points, which is more than we typicallyfind for other languages in the dependency pars-ing literature. It is possible that the more seman-tically oriented nature of the Paninian annotationscheme makes labeled parsing more difficult thanin schemes based on more surface-oriented gram-matical functions. However, I believe that strongerexplanatory factors can be found in properties ofthe data sets.

As noted above, the average sentence lengthis low in all data sets, which means that mostsentences will not have a very complex syntacticstructure, which in turn favors unlabeled parsingaccuracy. On the other hand, the training sets areall very small, which means that many labels aresparsely represented, which in turn has a negativeimpact on labeled parsing accuracy. If this hypo-thesis is correct, then we should expect to see animprovement of labeled accuracy as the data setsgrow larger, but it is also possible that unlabeledaccuracy will in fact drop as longer and more com-plex sentences are added to the data sets.

Table 4 provides a more detailed analysis of theresults by reporting labeled precision and recall for

the twelve dependency types in the coarse-grainedlabel set (with unlabeled precision and recall inparentheses), based on cross-validation over theentire training sets. For Bangla and Telugu, thehighest accuracy is achieved for the label main, as-signed to the root of each dependency tree, whichhas both precision and recall over 90%. The sec-ond highest accuracy is reported for the label ccof,assigned to conjuncts in a coordination, with bothprecision and recall over 80%. For both these re-lations, labeled precision/recall is also very closeto unlabeled precision/recall. For Hindi, the pic-ture is rather different, which may be partly due tothe fact that a different transition system was usedfor this language. First of all, precision for mainis much lower, which is due to fragmented parseswith too many root nodes. Secondly, the r6 label,assigned to possessives, has a precision and recallthat is higher than the ccof label and also higherthan for the r6 label in the other two languages.

Looking at the karaka relations k1-ext, k2-extand k7-ext, finally, we see that they have very highunlabeled precision and recall (85–90%) in all lan-guages but substantially lower labeled precisionand recall. The same kind of pattern is found alsofor the vmod-rest label. Whether these patternsare due primarily to an intrinsic difficulty of thePaninian scheme of categorization or to the lim-

16

Bangla Hindi TeluguPrecision Recall Precision Recall Precision Recall

ccof 83.4 (83.2) 83.2 (83.5) 78.6 (80.3) 77.2 (77.7) 80.7 (84.1) 80.1 (83.8)jjmod – (–) – (–) 50.0 (66.7) 9.1 (15.2) – (–) – (–)fragof – (–) 0.0 (0.0) – (–) – (–) – (–) – (–)k1-ext 68.8 (89.5) 70.7 (90.3) 72.4 (90.6) 71.1 (92.3) 60.1 (87.4) 64.4 (90.1)k2-ext 68.7 (89.4) 76.7 (92.9) 69.1 (92.0) 73.8 (91.8) 63.0 (91.7) 62.9 (92.7)k7-ext 65.8 (84.5) 64.2 (86.8) 79.3 (90.4) 78.3 (92.0) 66.0 (85.4) 62.2 (88.7)main 92.3 (92.3) 93.4 (93.4) 77.2 (77.2) 89.7 (89.7) 95.9 (96.0) 96.0 (96.2)nmod 45.9 (57.1) 26.4 (39.0) 58.0 (64.9) 32.4 (38.8) 49.6 (68.1) 34.9 (44.8)r6 75.8 (79.6) 77.4 (79.2) 86.0 (86.6) 87.1 (88.8) 67.7 (78.5) 50.6 (62.1)rbmod – (–) – (–) – (–) 0.0 (0.0) – (–) 0.0 (25.0)relc 48.6 (56.8) 35.3 (35.3) 56.0 (60.3) 25.8 (30.2) 40.0 (60.0) 12.5 (50.0)vmod-rest 67.9 (84.9) 64.0 (86.4) 62.9 (85.0) 61.3 (84.4) 65.8 (85.4) 67.6 (86.9)

Table 4: Labeled precision and recall for coarse-grained dependency types in Bangla, Hindi and Telugu;unlabeled precision/recall in parentheses. (For precision, – means that the parser did not assign this label;for recall, – means that the label did not occur in the data set.)

ited amount of training data for labeled parsing, ashypothesized above, must remain a topic for futureresearch.

5 Conclusion

I have presented the work done to optimize Malt-Parser for Bangla, Hindi and Telugu in the NLPTools Contest at ICON 2009. Using MaltParserwith default settings as the baseline, optimiza-tion improved the labeled attachment scores by7–13 percentage points and the unlabeled attach-ment scores by 2–5 percentage points on develop-ment data. This led to the second best results forHindi and on average, and the third best results forBangla and Telugu..

It is not impossible that parsing accuracy canbe improved further by performing more extensivefeature selection experiments. However, a moresubstantial increase in accuracy can only comefrom a substantial increase in the amount of train-ing data, since the data sets used for training andvalidation in the NLP Tools Contest are really toosmall for data-driven approaches to syntactic pars-ing. Having more data available will also make itpossible to test the hypothesis that unlabeled pars-ing accuracy is currently quite high thanks to a rel-atively low syntactic complexity, while labeled ac-curacy is very low because of data sparseness.

Another important source of information forfurther improvement is the growing body of workfocusing specifically on parsing Indian languages,exemplified by Bharati et al. (2009a) and Bharati

et al. (2009b). By taking into account the specialfeatures of these languages, it should be possibleto attain higher accuracy than with a completelyagnostic language-independent system.

Acknowledgments

I want to thank the organizers of the NLP ToolsContest for their sterling effort from start to finish.

ReferencesAkshar Bharati, Mridul Gupta, Vineet Yadav, Karthik

Gali, and Dipti Misra Sharma. 2009a. Simple parserfor indian languages in a dependency framework.In Proceedings of the Third Linguistic AnnotationWorkshop, pages 162–165.

Akshar Bharati, Samar Husain, Dipti Misra, and Ra-jeev Sangal. 2009b. Two stage constraint based hy-brid approach to free word order language depen-dency parsing. In Proceedings of the 11th Interna-tional Conference on Parsing Technologies (IWPT),pages 77–80.

Sabine Buchholz and Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing.In Proceedings of the 10th Conference on Computa-tional Natural Language Learning (CoNLL), pages149–164.

Chih-Chung Chang and Chih-Jen Lin, 2001.LIBSVM: A Library for Support Vec-tor Machines. Software available athttp://www.csie.ntu.edu.tw/∼cjlin/libsvm.

Michael A. Covington. 2001. A fundamental algo-rithm for dependency parsing. In Proceedings of the

17

39th Annual ACM Southeast Conference, pages 95–102.

Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective dependency parsing. In Proceedings ofthe 43rd Annual Meeting of the Association forComputational Linguistics (ACL), pages 99–106.

Joakim Nivre, Johan Hall, and Jens Nilsson. 2006a.Maltparser: A data-driven parser-generator for de-pendency parsing. In Proceedings of the 5th In-ternational Conference on Language Resources andEvaluation (LREC), pages 2216–2219.

Joakim Nivre, Johan Hall, Jens Nilsson, GulsenEryigit, and Svetoslav Marinov. 2006b. Labeledpseudo-projective dependency parsing with supportvector machines. In Proceedings of the 10th Confer-ence on Computational Natural Language Learning(CoNLL), pages 221–225.

Joakim Nivre, Johan Hall, Sandra Kubler, Ryan Mc-Donald, Jens Nilsson, Sebastian Riedel, and DenizYuret. 2007a. The CoNLL 2007 shared task ondependency parsing. In Proceedings of the CoNLLShared Task of EMNLP-CoNLL 2007, pages 915–932.

Joakim Nivre, Johan Hall, Jens Nilsson, AtanasChanev, Gulsen Eryigit, Sandra Kubler, SvetoslavMarinov, and Erwin Marsi. 2007b. Maltparser:A language-independent system for data-driven de-pendency parsing. Natural Language Engineering,13:95–135.

Joakim Nivre, Igor M. Boguslavsky, and Leonid L.Iomdin. 2008. Parsing the SynTagRus treebank ofrussian. In Proceedings of the 22nd InternationalConference on Computational Linguistics (Coling2008), pages 641–648.

Joakim Nivre. 2003. An efficient algorithm for pro-jective dependency parsing. In Proceedings of the8th International Workshop on Parsing Technologies(IWPT), pages 149–160.

Joakim Nivre. 2008. Algorithms for deterministic in-cremental dependency parsing. Computational Lin-guistics, 34:513–553.

18

Maximum Spanning Malt: Hiring World’s Leading Dependency Parsersto Plant Indian Trees

Daniel ZemanUniverzita Karlova v Praze, Ústav formální a aplikované lingvistiky

Malostranské náměstí 25, CZ-11800, Praha, [email protected]

Abstract

We present our system used for participa-tion in the ICON 2009 NLP Tools Contest:dependency parsing of Hindi, Bangla andTelugu. The system consists of three exist-ing, freely available dependency parsers,two of which (MST and Malt) have beenknown to produce state-of-the-art struc-tures on data sets for other languages. Var-ious settings of the parsers are explored inorder to adjust them for the three Indianlanguages, and a voting approach is usedto combine them into a superparser. Sincethere is nothing novel about the approachused, substantial part of the paper is de-voted to the analysis of errors the systemmakes on the given data sets.

1 Introduction

Dependency parsing, i.e. sentence analysis thatoutputs tree of word-on-word dependencies (as op-posed to constituent trees of context-free deriva-tions), gained growing attention and popularity re-cently. There are data-driven dependency parsersthat can be trained on syntactically annotated cor-pora (treebanks) and new, previously unseen ma-terial can be parsed very efficiently (Nivre, 2009).Most of the successful parsers employ discrimi-

native learning techniques to sort out vast sets ofpotentially useful features observed in the inputtext. Thus, for every new training treebank, smartfeature engineering is the key to getting the mostout of the existing parsers, regardless how wellthey performed on other data sets and languages.Now that there are new treebanks available for twoIndo-Aryan and one Dravidian language, we tookthree existing dependency parsers and explored thepossibilities of tuning them for the new trainingdata. Both parser configuration and data prepro-cessing are relevant approaches to the tuning. In

addition, we used parser combination to furtherimprove the results.Throughout the paper we focus mainly on the

unlabeled attachment score. Although the parsersproduce labeled dependencies, and although la-beled attachment score is the main evaluation met-ric used by the organizers of the NLP tools con-test, we do not optimize the system towards labelaccuracy. The main task of the parsers is to builddependency structure, and the task can be solvedindependently of assigning the dependency labels.Note that the opposite implication does not hold.Labels are assigned to dependencies, not depen-dencies to labels. Thus, if the parser creates betterstructure, we can hope for better labeling accuracy.On the other hand, the labeling task can later be as-signed to a specialized classifier that uses the orig-inal input features and the new dependency struc-ture.The rest of the paper is organized as follows: In

Section 2, we describe the respective parsers andthe combined parsing system. In Section 3, we re-port on the experiments we performed, discuss var-ious results on the development set and analyze theerrors. In Section 4 we present the official resultson the test data. We conclude by summarizing thebest configuration we were able to find, and futureimplications.

2 System Description

Several good trainable dependency parsers haveemerged during the past five years. The CoNLL-X (Buchholz and Marsi, 2006) and CoNLL 2007(Nivre et al., 2007a) shared tasks in multilingualdependency parsing have greatly contributed to thedevelopment of the parsers. Some of the parsersare now freely available on the web, some are evenopen-source. We selected three of the publiclyavailable parsers for our experiments:

Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, MacmillanPublishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009

19

2.1 MST Parser

The Maximum Spanning Tree (MST) parser (Mc-Donald et al., 2005) views the sentence as a di-rected complete graph with edges weighted by afeature scoring function. It finds for the graphthe spanning tree that maximizes the weights ofthe edges. A multi-class classification algorithmcalled MIRA is used to compute the scoring func-tion.MST Parser achieved the best unlabeled attach-

ment scores (UAS) for 9 out of the 13 languages ofCoNLL-X, and second best scores in two others.Parsing is fast but training the parser takes manyhours on large treebanks. On small data however,multiple quick experiments with different settingsare still doable. The parser is implemented in Javaand freely available for download.1

2.2 Malt Parser

The Malt Parser (Nivre et al., 2007b) is a deter-ministic shift-reduce parser where input words canbe either put to the stack or taken from the stackand combined to form a dependency. The decisionwhich operation to perform is made by an oraclebased on various features of the words in the inputbuffer and the stack. The default machine learningalgorithm used to train the oracle is a sort of SVM(support vectormachine) classifier (Cristianini andShawe-Taylor, 2000).Malt Parser has participated in both CoNLL-

X and CoNLL 2007 shared tasks, and although itachieved the best UAS in three languages only,it usually scored among the five best parsers,sometimes with statistically insignificant differ-ence from the winner. Malt Parser is really fastand its new Java implementation is open-source,freely available for download.2

2.3 DZ Parser

In order to combine the two above parsers, weneeded a third parser. We picked DZ Parser (Ze-man, 2004), which is also reasonably fast andfreely available.3 Although its accuracy, if com-pared to MST or Malt, is worse by a wide margin,this parser proved useful because its only role wasto help to form amajority wheneverMST andMaltdisagreed.

1http://sourceforge.net/projects/mstparser/2http://maltparser.org/3http://ufal.mff.cuni.cz/~zeman/projekty/

parser/

DZ Parser builds a model of bigrams of wordsthat occur together in a dependency; most of thetime, words are identified by their part of speechtags and morphological features. The parser wasoriginally developed for Czech but it can be re-trained for any other language.4

2.4 Voting SuperparserThe three parsers are combined using a simpleweighted-voting approach similar to Zeman andŽabokrtský (2005), except that the output is guar-anteed to be cycle-free. We start by evaluatingevery parser separately on the development data.The UAS of each parser is subsequently used asthe weight of that parser’s vote. Dependencies areparent-child relations, and for every node there areup to three candidates for its parent (if all threeparsers disagree). Candidates get weighted votes– e.g., if parsers with weights w1 = 0.8 and w2 =0.7 agree on the candidate, the candidate gets 1.5votes. Since we have only three parsers, in prac-tice this means that the candidate of the best parserlooses only if 1. the other two parsers agree onsomeone else, or 2. if attaching the child to thiscandidate would create a cycle.The tree is constructed from the root down. We

repeatedly add nodes whose winning parent candi-dates are already in the tree. If none of the remain-ing nodes meet this condition, we have to breaka cycle. We do so by examining all unattachednodes. At each nodewe note the votes of its currentwinning parent. Then we remove the least-scoringwinner and go on with adding nodes until all nodesare attached or there is another cycle to break.

3 Experiments

The final test data are blind, any error analysis istherefore impossible. That is why all scores givenin this section were measured on the developmentdata. All three treebanks follow the same annota-tion scheme and each of them is available in twoflavors:

• nomorph variety contains word forms, chunklabels, dependency links and dependency la-bels

• morph variety is augmented by automaticallyassigned lemmas, part of speech tags and val-

4Of course there are other dependency parsers that suc-cessfully participated in the CoNLL shared tasks and areavailable for download. One alternative worth mentioningis the ISBN Parser (Titov and Henderson, 2007) at http://flake.cs.uiuc.edu/~titov/.

20

http://sourceforge.net/projects/mstparser/

http://maltparser.org/

http://ufal.mff.cuni.cz/~zeman/projekty/parser/

http://ufal.mff.cuni.cz/~zeman/projekty/parser/

http://flake.cs.uiuc.edu/~titov/

http://flake.cs.uiuc.edu/~titov/

ues of morphological features (gender, num-ber, person, case, postposition and tam –tense+aspect+modality)

3.1 Morphology

Table 1 shows baseline results on the nomorphdata. Both MST and Malt parsers were invoked inprojective mode, Malt with the default Nivre arc-eager algorithm.

MST Malt DZ Votehi 80.32 81.84 62.00 82.48bn 82.00 84.71 71.02 83.11te 77.63 80.89 70.52 80.59

Table 1: Baseline UAS of the four parsers onnomorph development data. Language codes fol-low ISO 639: hi = Hindi, bn = Bangla, te = Telugu.

There are several ways how to use the additionalinformation from themorph data. The easiest way,exploitable by all three parsers, is to combine thechunk label, the POS tag and the features into onetag string. The results (not presented here) are verypoor. Although there may be tagging errors, themost likely cause is data sparseness. In Table 2 weillustrate this by showing the numbers of uniquevalues in the various attributes of treebank words.

occ frm lem cl pos feathi 13779 3973 3134 10 33 714bn 6449 2997 2336 14 30 367te 5494 2462 1403 12 31 453

Table 2: Size of the training corpora: occ – wordoccurrences, frm – distinct forms, lem – lemmas,cl – chunk labels, pos – part of speech tags, feat –feature value combinations.

To fight sparseness, we could either restrict thetag to selected information, or split the informationinto multiple features learnt separately, or both.We first restricted the tag string to selected infor-mation. Parsing on other treebanks showed thatPOS with case is especially useful (Zeman, 2004).The other feature we selected is called vibhakti,which partially corresponds to case suffix and par-tially to postposition.5Table 3 presents the results of restricting the tag

string to POS+case+vibhakti. This move espe-5The Indian treebanks at hand are unusual in that nodes do

not always map to words. They represent chunks, with func-tion words such as postpositions hidden in node attributes.

cially improves the MST Parser, which now out-performs Malt on Hindi and Bangla. Malt im-proves on Hindi but drops behind on Bangla andTelugu. DZ Parser also improves on Hindi and de-teriorates elsewhere but there is an interesting ob-servation: even though its own scores are worse,the worsened output actually improves the votingresults (provided the configurations of MST andMalt are fixed). So it seems that the newly intro-duced errors are less important (because taken careof by the more powerful parsers) while some dif-ficult parts of the data are now covered better.


Table 3: UAS on refined morph data (POS tag,case and vibhakti concatenated).

Finally, we returned to nomorph with Malt onBangla and Telugu, where the POS+case+vibhaktitags did not help. The results are given in Table 4.


Table 4: UAS on mixed data: MST and DZ usePOS+case+vibhakti for all languages, Malt usesthat for Hindi only, elsewhere it uses just POS.

3.2 NonprojectivityNonprojectivity is a property of the dependencystructure and the word order (Hajičová et al., 2004)that makes parsingmore difficult. All three parserscan produce nonprojective structures and all threetreebanks are nonprojective. However, except forHindi, the proportion of nonprojective dependen-cies is so small that one can hardly imagine thatrunning the parsers in nonprojective mode wouldbring any improvement.Experiments with different configurations of the

Malt Parser revealed that a nonprojective algo-rithm (read: an algorithm capable of producingnonprojective structures) led to the best results.However, another nonprojective algorithmwon in-significantly in Telugu. See Table 6 for more de-tails. Finally, Table 7 presents the results of thevoted parser when the best parsing algorithm of

21

Malt Parser is chosen for each language. This isalso the configuration we used to parse the test setfor the official evaluation round 2.Although the MST Parser can also run in non-

projective mode, due to time constraints we werenot able to retrain it in that mode for the officialevaluation round. Bharati et al. (2008) reportedthat it did help for Hindi.

Edges Sentenceshi 01.83 13.93bn 00.96 05.49te 00.45 01.31

Table 5: Percentage of nonprojective dependen-cies, and of sentences containing at least one non-projectivity.

Algorithm hi bn tenivreeager 85.84 84.71 80.89nivrestandard 86.56 85.45 80.15covproj 86.32 85.57 80.30covnonproj 86.96 84.09 80.30stackproj 86.56 85.08 80.89stackeager 81.76 83.85 81.04stacklazy 87.60 84.71 80.59

Table 6: The seven parsing algorithms of MaltParser on the three languages. Note however thatnot all differences are significant.


Table 7: UAS of the voting parser when Malt usesdifferent algorithms for different languages: stack-lazy for Hindi, covproj for Bangla and stackeagerfor Telugu.

3.3 Error PatternsThe accuracy of the dependencies is relatively highand it is difficult to trace repetitive error patterns.In Hindi, many wrong attachments seem to belong-distance, and verbs, conjunctions, root andNULL nodes are frequently involved. Frequentwords should perhaps be available to the parsersas parts of tag strings: for instance, Hindi िक (ki)“that” or तो (to) are wrongly attached because theparser only sees the general CC tag. On a similar

note, problems with coordination, also observede.g. by Zeman (2004), occur here, too: भाई औरभाभी (bhāī aura bhābhī) “brother and his wife” iscorrectly recognized as coordination rooted by theconjunction और, however, the conjunction nodelacks the information about its noun children andfails to attach as the subject of the verb.The tag string should contain both the chunk

label and the POS tag. So far we wrongly as-sumed that POS always determines the chunk la-bel. It is often so but not always, as exemplifiedin the Bangla chunk sequence তেব সদীপ ওেকএকিদন আডােল েডেক বেলিছল েকৗতহল েদখােলতিম উচেত উঠেত অিনেমষ (tabe sudīpa oke ekad-ina ārāle deke balechila kautūhala dekhāle tumi um-cute uthate animeša). Thewords েডেক and েদখােলare tagged VGNF|VM while বেলিছল and উঠেতare VGF|VM. The parser gets them wrong and itcould be caused by it seeing only VM in the tag.In Telugu, extraordinal number of sentences fol-

low the SOV order so strongly that the last node(verb) is almost always attached to the root andmost other nodes are attached directly to the lastnode. An example chunk sequence where this rulewould lead to 100 % accuracy follows: రషటరంల రంగరడడ మదక నజమబద జలలలల పంటనుగపప పండసుతననరు (rāštramlo ramgāreddi medaknijāmābād jillālalo pamtanu goppo pamdistunnāru).In the light of such examples it seems reasonableto provide the parsers with an additional featuretelling whether a particular dependency observesthe “naïve Telugu” structure. Note however thatthis will not help with the other two languages.While 73.75 % of Telugu dependncies follow thisrule, it is only 39.52 % in Bangla and 35.71 % inHindi.

3.4 Voting PotentialIn order to see howmuch can be potentially gainedfrom parser combination, we summarized the at-tachments that at least one of the parsers got cor-rect. This oracle accuracy gives an upper limit forthe real scores we can achieve. It corresponds tothe case that for every word, an oracle correctlytells which parser to ask about the word’s par-ent. Table 8 presents the oracle accuracies togetherwith percentage of unique correct attachments thatonly one parser delivered. These figures give someidea of how much similar are the errors of the re-spective parsers to each other. Malt parser hasthe most unique know-how in all three languages,which could be explained by its focus on local fea-

22

tures. Both MST and DZ can reach for global,sentence-wide relations. Note however, that thedevelopment data set is small and the percentagescorrespond to 42 (Malt/Bangla) or less words.

Oracle UqMST UqMalt UqDZhi 93.92 2.96 3.12 1.84bn 94.20 4.32 5.18 1.97te 88.00 2.37 5.48 2.07

Table 8: Oracle accuracy for the three languages,and unique correct attachments (%) proposed by asingle parser.

4 Official Evaluation

Finally, we present the official evaluation of ourvoting superparser, as measured by the organizerson the test data. For this purpose, the parsing sys-tem has been retrained on both the training dataand the development data. The results are shownin Table 9.

UAS LAA LAShi 88.32 (3:90.14) 72.66 (4:76.38) 68.25 (4:74.48)bn 86.68 (4:90.32) 71.28 (4:81.27) 66.60 (5:79.81)te 82.50 (5:86.28) 54.20 (4:61.58) 50.94 (4:60.55)

Table 9: Official scores on the test data: unlabeledattachment score (UAS), label assignment accu-racy (LAA) and labeled attachment score (LAS).The numbers in parentheses are the rank of our sys-tem and the score of the best systemw.r.t. the givenmetric.

The second and final evaluation round also fea-tured refined training data “with courser tags” (asdescribed by the contest organizers). These new“tags” were the dependency labels, so the label-ing part of the task was easier. Even UAS was af-fected, as the parsers now saw different trainingmaterial; note however that the changes in UASare minimal, so we do not report the results here.

5 Related and Future Work

There is a large body of work on parser combina-tion. A summary can be found in Nivre and Mc-Donald (2008), whose approach is also related toours w.r.t. the selection of parsers. However, theirfeature-based integration of MST andMalt parsersis much more sophisticated than our lightweightvoting. Further improvement of accuracy can beexpected if MST-Malt integration is applied to theIndian treebanks.

There is still much room for experiments withvarious Malt Parser configurations (e.g. the spe-cial root handling) and deeper feature engineeringfor both Malt and MST: the parsers could workwith POS, case and vibhakti separately rather thanconcatenated into one string.Labeling of the dependencies is another prob-

lem that deserves more attention. We have con-centrated on the unlabeled attachment score so farand for the sake of the official evaluation, we sim-ply pushed the MST labels through. A separatepostprocessing classifier would probably producebetter results, as suggested in the README file ofthe MST Parser.

6 Conclusion

We have described our system of voting parsers,as applied to the ICON 2009 NLP Tools Contesttask. We showed that case and vibhakti are impor-tant features at least for parsing Hindi while theirusability in Bangla and Telugu is limited by datasparseness. Providing these features to MST andDZ in all languages, and to Malt in Hindi onlyyielded the best combined parser. We also dis-cussed several error patterns that could lead to fur-ther improvements of the parsing system in future.

Acknowledgements

We are enormously grateful to the developers ofMST and Malt parsers for making their softwareavailable to the research community. The researchhas been supported by the grant MSM0021620838(Czech Ministry of Education).

ReferencesAkshar Bharati, Samar Husain, Bharat Ambati, SambhavJain, Dipti Misra Sharma, and Rajeev Sangal. Two seman-tic features make all the difference in parsing accuracy. InProceedings of ICON 2008, Pune, India, December 2008.3.2

Sabine Buchholz and Erwin Marsi. Conll-x shared taskon multilingual dependency parsing. In Proceedings ofthe Tenth Conference on Computational Natural Lan-guage Learning (CoNLL-X), pages 149–164, New YorkCity, June 2006. Association for Computational Lin-guistics. URL http://www.aclweb.org/anthology/W/W06/W06-2920. 2

Nello Cristianini and John Shawe-Taylor. An Introduction toSupport Vector Machines and Other Kernel-Based Learn-ing Methods. Cambridge University Press, Cambridge,UK, 2000. 2.2

Eva Hajičová, Jiří Havelka, Petr Sgall, Kateřina Veselá, andDaniel Zeman. Issues of projectivity in the prague depen-dency treebank. The Prague Bulletin of the MathematicalLinguistics, 81, 2004. 3.2

23

http://www.aclweb.org/anthology/W/W06/W06-2920

http://www.aclweb.org/anthology/W/W06/W06-2920

Ryan McDonald, Fernando Pereira, Kiril Ribarov, and JanHajič. Non-projective dependency parsing using spanningtree algorithms. In Proceedings of Human Language Tech-nology Conference and Conference on Empirical Methodsin Natural Language Processing, pages 523–530, Vancou-ver, British Columbia, Canada, October 2005. Associa-tion for Computational Linguistics. URL http://www.aclweb.org/anthology/H/H05/H05-1066. 2.1

Joakim Nivre. Non-projective dependency parsing in ex-pected linear time. In Proceedings of the Joint Confer-ence of the 47th Annual Meeting of the ACL and the 4thInternational Joint Conference on Natural Language Pro-cessing of the AFNLP, pages 351–359, Suntec, Singa-pore, August 2009. Association for Computational Lin-guistics. URL http://www.aclweb.org/anthology/P/P09/P09-1040. 1

Joakim Nivre and Ryan McDonald. Integrating graph-based and transition-based dependency parsers. In Pro-ceedings of ACL-08: HLT, pages 950–958, Columbus,Ohio, June 2008. Association for Computational Lin-guistics. URL http://www.aclweb.org/anthology/P/P08/P08-1108. 5

Joakim Nivre, Johan Hall, Sandra Kübler, Ryan McDonald,Jens Nilsson, Sebastian Riedel, and Deniz Yuret. TheCoNLL 2007 shared task on dependency parsing. In Pro-ceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 915–932, Praha, Czechia, June 2007a.Association for Computational Linguistics. URL http://www.aclweb.org/anthology/D/D07/D07-1096. 2

Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev,Gülşen Eryiğit, Sandra Kübler, SvetoslavMarinov, and Er-win Marsi. Maltparser: A language-independent systemfor data-driven dependency parsing. Natural LanguageEngineering, 13(2):95–135, 2007b. 2.2

Ivan Titov and James Henderson. Fast and robust multi-lingual dependency parsing with a generative latent vari-able model. In Proceedings of the CoNLL Shared TaskSession of EMNLP-CoNLL 2007, pages 947–951, Praha,Czechia, June 2007. Association for Computational Lin-guistics. URL http://www.aclweb.org/anthology/D/D07/D07-1099. 4

Daniel Zeman. Parsing with a Statistical Dependency Model.PhD thesis, Univerzita Karlova v Praze, Praha, Czechia,2004. 2.3, 3.1, 3.3

Daniel Zeman and Zdeněk Žabokrtský. Improving parsing ac-curacy by combining diverse dependency parsers. In Pro-ceedings of the Ninth International Workshop on ParsingTechnologies (IWPT), pages 171–178, Vancouver, BritishColumbia, Canada, 2005. Association for ComputationalLinguistics. ISBN 1-932432-58-2. 2.4

24

http://www.aclweb.org/anthology/H/H05/H05-1066

http://www.aclweb.org/anthology/H/H05/H05-1066

http://www.aclweb.org/anthology/P/P09/P09-1040




http://www.aclweb.org/anthology/D/D07/D07-1096




Structure Simplification and Demand Satisfaction Approach to

Dependency Parsing in Bangla

Sankar De Gupta College of Technological

Sciences, Asansol-713301

[email protected]

Arnab Dhar Indian Statistical Institute,

203, BT Road, Kolkata 700108

[email protected]

Utpal Garain Indian Statistical Institute,

203, BT Road, Kolkata 700108

[email protected]

Abstract

A constraint based Dependency Parsing

(Nivre et. al. 2004, 2005, 2007) has been

attempted and applied to Bangla which

is a free word order language. The

Paninian Grammatical model (Bharati et

al., 1993, 1995), which is very effective

for free word order languages (e.g.

Indian Languages), has been used for

this purpose. The approach is to simplify

complex and compound sentential

structures first, then to parse the simple

structures so obtained by satisfying the

Karaka demands of the Demand Groups

(Verb Groups) and to rejoin such parsed

structures with appropriate links and

Karaka labels. The parser has been

trained with a Treebank of 1000

annotated sentences and then evaluated

with un-annotated test data of 150

sentences. The evaluation shows that the

proposed approach achieves 90.32% and

79.81% accuracies for unlabeled and

labeled attachments, respectively.

1 Introduction

Bangla is a morphologically rich free word

order language. Parsing such types of

languages, although very challenging, can be

better handled with dependency based

framework and Paninian Grammatical model.

As conceived by the semantic model of

Paninian Grammar, every verbal root (dhaatu)

denotes an action consisting of: (i) an activity

and (ii) a result. Result is the state which when

reached the action is complete. Activity

consists of actions carried out by different

participants or Karakas (mostly noun groups)

involved in the action. The Karakas have direct

relation to the verb. The Paninian model used

only six such Karakas, but we have used some

more participants which are not Karakas

exactly, but maintains some relations to the

verb or other lexical items in the sentence. For

example, relations such as purpose (marked as

‘rt’), reason (marked as ‘rh’), and genitive

(marked as ‘r6’) etc. These are called relations

other than Karakas (Begum et al., 2008a). The

Karakas and relations other than Karakas

(Dependency Tag-set) used in this paper has

been described in Appendix A. So for a very

simple sentence (single verb group) like S1, the

verb group is the root of the dependency tree

connecting some noun groups with appropriate

Karaka labels (Begum et al., 2008a, 2008b).

E.g. consider the sentence

S1. rAma BAwa KAyZa

Ram rice eats

Ram eats rice.

The parsed output will be

KAyZa

k1 k2

rAma BAwa

Figure 1: Parsed output of S1

Simple sentences are parsed using the demand

frames and transformation rules to handle the

situation better. About 500 demand frames or

Karaka frames including mixed verbs, main

verbs and their causative forms have been

considered in the present study.

2 The Parsing Approach

This section describes the broad algorithmic

approach of the parser. It also describes how

compound and complex structures have been

handled with the help of grammatical rules.

2.1 The Algorithm

Input: A sentence with all morphological and

chunking information.

Output: A dependency tree having the chunked

phrases as nodes.

25

Step 1: If the sentence is compound then divide

the sentence to get two or more simple or

complex sentences. Pass each of them to Step 2

one by one.

Otherwise pass the sentence to Step 2.

Step 2: If the sentence is complex then divide

the sentence to get two or more simple

sentences. Pass each of them to Step 3 one by

one.

Otherwise pass the sentence to Step 3.

Step 3: Parse the simple sentence

Step 4: Rejoin the parsed sentences divided in

step 2 with proper links and labels.

Step 5: Rejoin the parsed sentences divided in

step 1 with proper link and label.

Step 6: Return the parsed sentence.

2.1 Parsing of Compound Sentences

The sentences which have sentence level

coordinate conjuncts have been treated as

compound sentences and handled in Step 1 of

the Algorithm. E.g. consider the sentence

below.

S2.(rAma BAwa KAyZa) ebaM (syAma ruti KAyZa.)

Ram rice eats and Shyam bread eats.

Ram eats rice and Shyam eats bread.

In the above sentence, two simple sentences

shown within braces are joined with sentence

label conjunct ebaM (and) to form a compound

sentence. Our approach is to identify these

sentence label conjuncts and divide the

sentence to make the parsing task easier. After

parsing the two simple sentences the roots of

the two sentences are linked with the conjunct

with ‘ccof’ relation as follows:

Figure 2: Rejoining simple structures to

form original compound structure.

2.2 Parsing of Complex Sentences

The sentences having relative clauses are

considered as complex sentences and handled

in Step 2 of the Algorithm. E.g. consider the

sentence below.

S3. (ye Celeti seKAne base Ace) (se AmAra BAi hayZa)

(Who boy the there sitting is) ( he my brother is)

The boy who is sitting there is my brother.

The first part of the sentence is a relative

clause which modifies ‘se’(he). ‘ye’ and ‘se’

are grammatical markers of relative clause and

main clause, respectively. Thus ‘yeKAne-

seKAne’ (where-there), ‘yAr-wAr’ (whose-his)

etc. are such clause markers. With the help of

those clause markers, a complex sentence is

divided into multiple simple sentences which

are then parsed in Step 3 and rejoined in Step 4

as shown in Fig 3.

3 Demand Frames and their

Transformations

Simple sentences have been parsed with

demand satisfaction approach. Here comes the

role of Demand Frames or Karaka Frames. A

Demand Frame or Karaka Frame for a verb

indicates the demands a verb makes i.e. which

Karakas it takes to form a meaningful

sentence.

Figure 3: Joining two clauses

A mapping is specified between Karaka

relations and Vibhaktis (post-positions, suffix).

It depends on the verbal semantics and the

tense, aspect and modality (TAM) label. Each

TAM specifies a transformation rule

depending on which the basic frame is

changed. The Demand Frame of a verb also

specifies what Karakas are mandatory or

optional for the verb and what Vibhaktis (post-

positions) they take.

The most important aspect regarding the

Karaka-vibhakti mapping is that it depends on

the verb and its TAM label. The mapping is

nmod_relc se BAi

AmAra

r6

hayZa

K1 K1s

base ACe

ye Celeti

K1

26

represented by two structures: default Karaka

frame and Karaka frame transformation (De et.

al. 2009). The default Karaka frame for a verb

or a class of verbs gives the mapping

corresponding to the TAM known as basic to

the verb. For the basic TAM label, the Karaka

frame specifies the Vibhaktis permitted for the

Karaka relations. We have chosen the TAM

corresponding to the present indefinite tense as

the basic TAM. For other TAM labels, there

are Karaka frame transformation rules. Thus,

for a given verb with some TAM label, the

appropriate Karaka frame can be obtained

using the basic frame (Bhattacharya, 1993) and

the corresponding transformation rules.

3.1 Implementation

Development of Karaka frames is discussed

below.

Verb: gon

Type: Transitive

English gloss: to count

Table 1: Karaka Frame for verb ‘gon’ (to count)

S4. rAmaф (k1) sanXyAyZa(k7t) AkAse (k7p) wArAф(k2)

gone.

Ram evening-Loc sky-Loc star counts

(Ram counts stars in the sky in the evening.)

S5. rAmaф(k1) hAwe (/haat diye) tAkAф(k2) gone.

Ram hand-Loc(/hand with) money(k2) counts.

Ram counts money with his hand.

In the above examples, k1 and k2 in both the

sentences are mandatory (m), without them the

sentences seem to be incomplete. Other

Karakas are desirable (d).

3.2 Transformations

Depending on the TAM of verbs the basic

frame may change. We take ‘we_ha’ TAM for

the above verb. The example S5 becomes: S6. rAmakeф(k1) sanXyAyZa(k7t) AkAse(k7p) wArAф(k2)

gunwe hay.

Ram-dat evening-Loc sky-Loc star count-to have.

(Ram has to count stars in the sky in the evening.)

So the transformation rule for the ‘we_ha’

TAM becomes:

The vibhakti ‘ф’ for Karaka ‘k1’ has been

changed to ‘ke’.

We have prepared an exhaustive TAM list in

Bangla and transformation rules, if any exist,

have been framed for each of them.

4 Constraint Graph

For a given sentence after the word groups

have been formed, the verb groups are

identified. Then each of the source groups are

tested against the Karaka restrictions in each

Karaka frame (transformed according to TAM

rules). When testing a source group against the

Karaka restrictions of a demand group, vibhaki

information is checked and if found

satisfactory the source group becomes a

candidate for the Karaka of the demand group.

This can be shown in the form of a Constraint

Graph (CG). Nodes of the graph are the word

groups and there is an arc from a verb group to

a source group labeled by a Karaka, if the

source group satisfies the Karaka restrictions

in the Karaka chart.

A restricted CG can be obtained by following

certain rules as mentioned below:

i) Leftness rule: In case of maximum

Indian languages, source groups occur before

the demand groups. Although some exceptions

are seen. This can be well handled by

including an extra column in the demand

frame as src_pos (source position) which may

be ‘l’ (can occur at the left of a demand group)

or ‘r’ (can occur at the right of a demand

group).

ii) gnp agreement: The ‘Karta Karaka’

(k1) always agrees in gnp for non-passive

sentences and k2 for passive.

iii) Lexical type of the source must match

that in the Karaka frame.

For example, consider the sentence S7 and

let’s see how its CG is constructed, S7.Amiф xupure BAwaф KAi

I noon-Loc rice eat (1st

person )

I eat rice at noon.

Demand Frame for KA:

Table 2: Karaka Frame for verb ‘KA’ (to eat)

27

No transformation is needed for as there is no

TAM rule.

The CG (restricted) corresponding to the above

sentence becomes,

Figure 4: CG for S7.

5 Constraints

A parse is a sub-graph of the constraint graph

containing all the elements of the CG and

satisfying the following constraints:

C1. For each of the mandatory Karakas in a

Karaka frame for each demand group, there

should be exactly one outgoing edge labeled

by the Karaka from the demand group.

C2. For each of the desirable or optional

Karakas in a Karaka frame for each demand

group, there should be at most one outgoing

edge labeled by the Karaka from the demand

group.

C3.There should be exactly one incoming edge

into each source group.

If several sub-graph of the CG satisfies the

above conditions, the sentence is probably

ambiguous. We have tried to resolve

ambiguities by applying more grammatical

constraints as discussed later.

6 Parsing as Bipartite Graph

Matching Problem

The parsing problem now can be reduced to a

bipartite graph matching problem. The

bipartite graph (BG) G is defined as a three-

tuple (D,S,E) where D={( di, kj), for each

demand group di and its Karaka demands kj’s

in its demand frame}, S is the set of all source

groups {s}, E is the set of edges from D to S,

i.e. E={(di,kj,st), if an attachment is possible

from (di, kj) to a source group st and D∩S=ф.

For a weighted BG, E is redefined as

E={(di,kj,st,w), where w is called the weight of

the edge. We have set a weight of 2 for

mandatory Karakas and 1 for desirables, so

that a mandatory Karaka surely gets an arc.

Now we define a matching M on the bipartite

graph as M⊆E with the property that no two

edges of M have a common node (one-to-one

mapping). The matching problem is to find a

maximal matching of G, i.e., matching with

the largest number of edges. A maximal

matching is called complete if every node in D

and S has an edge (one-to-one and onto).

6.1 Constructing Initial Bipartite Graph

The initial bipartite graph is constructed in

three stages:

i) For every source node s in the

constraint graph form a node s in S.

ii) For every demand node d in the

constraint graph and for every Karaka k in the

Karaka frame for d, form a node (d, k) in D.

iii) For every edge (d, s) in the constraint

graph labeled by Karaka k, create an edge ‘e’

(d, k, s, w) in E where w=2 if k is mandatory,

w=1 if k is desirable, w is called the weight of

the edge ‘e’.

7 Parsing of Simple Sentences

Pre-assumptions: i) Chunking has been done

with full accuracy ii) Demand Frames are

complete in the sense that the set of arc-labels

(Karaka labels) is exhaustive, Karaka-vibhakti

mapping is accurate and the transformation

rules are exhaustive for a particular verb iii)

Demand Frames are exhaustive in the sense

that every verb that is used in the language has

a Demand Frame.

Algorithm-2 parse_simple_sentence

Input: chunked simple sentence with all

morphological information.

Output: a maximal matching M.

Step1. Identify demand groups in the sentence

Step2. Load demand frame for each one

Step3.Transform demand frames according to

TAM rules if there is any.

Step4. Form initial weighted bipartite graph

Gw={D,S,E}.

Step5. Set matching M←ф // initially M

contains no edge

Step6. Search a node (di,kj) in D such that it

has a unique outgoing edge (di,kj,st,2) /* for

mandatory demands only*/

Step7. Set M←M + (di,kj,st,2)

E←E-(di,kj,st,2)

28

K1

K7t

K7p K7p

K7t

K1

K7p

K7t

Step 8. Remove all other edges incident on st.

Step 9. Search a node sp in S such that it has a

unique incoming edge (dm,kn, sp,w)

Step 10. Set M←M + (dm,kn, sp,w)

E←E-(dm,kn, sp,w)

Step 11. Remove all other edges originating at

(dm, kn).

Step 12. Repeat Step 6 to Step 11 until E= ф or

no nodes are found in Step 6 and 9.

Step 13. If E≠ ф then apply a grammatical

constraint discussed in 7.1, to resolve for an

edge ‘e’.

Add the edge ‘e’ to M, subtract ‘e’ and other

related edges from E. Goto Step 6.

Step 14. Return M

7.1 Ambiguity Resolution

For resolving ambiguities we had to use some

constraints. Relative Karaka position played an

important role in this case. For this purpose,

there is a thumb rule in Bangla. The Karakas in

a sentence occur in a sequence like k1 k7 k3

k5 k2 k4 v. We have used this with a little

modification: k1 k7t k7p k3 k5 k2/k2p k1s

k2g/ k4 v.

Here we discuss an example which is solved

by ambiguity resolution.

S8. Amiф bikAle seKAne base CilAm

I afternoon-Loc there-Loc sitting was.

I was sitting there in the afternoon.

The restricted CG corresponding to the above

sentence becomes,

Amiф bikAle seKAne base CilAm

Figure 5: CG for S8

Before ambiguity resolution, the sentence

gives two parses as shown in Figs. 6 and 7.


Figure 6: Parse-1 for S8


Figure 7: Parse-2 for S8.

In the ambiguity resolution stage, we apply the

constraint that k7t occurs before k7p. Thus the

final parse becomes parse-1.

8 Evaluation and Discussions

As mentioned before the parser has been

trained with a Treebank of 1000 annotated

sentences and then evaluated with un-

annotated test data of 150 sentences. The

evaluation shows that the proposed approach

achieves accuracies of 90.32%, 79.81%, and

81.27% when unlabeled attachments, labeled

attachments and label scores, respectively are

considered. This evaluation has been done by

ICON 2009 NLP Tool Contest Committee and

shows the best accuracy among the

participating teams in Bangla.

Analysis of errors shows that possible reasons

for making errors are: i) Lack of

exhaustiveness of demand frames: Bangla uses

a lot of mixed verbs like ‘mane ha’(seems),

‘banXa kar’(stop), etc. Preparing demand

frames for all of them is very tedious, though

we prepared some. As a result, the parser often

encounters unknown verbs and produces

incorrect parse. ii) Almost all Vibhaktis are

used in almost all Karakas in Bangla i.e. the

Karaka – Vibhakti mapping is very

inconsistent. This has made the parsing job

very challenging in Bangla. iii) Absence of

verbs in Bangla sentences is also a common

phenomenon, handling these types of

sentences is another challenging task. iv) In

Bangla, sometimes same verbal suffix (dhaatu

vibhakti) is used for different gnp’s. For

example, consider the sentences (a) wini

balena (he says) and (b) Apani balena (you

say). The Karta in (a) is third person singular

number [i.e. wini (he)] whereas in (b) it is

second person singular number [i.e., Apni

(you)]. But in both the sentences the verb form

is the same i.e. balena (bal-ena). In such cases,

checking verb agreements becomes difficult

and so also the parsing of the sentences. v) The

suffix for the non finite verbs (‘e’) is the same

with that of finite verbs with third person

K1 K7p

K7t

29

ordinary. So when the parser encounters a verb

with suffix ‘e’, it is confused.

The errors can be minimized taking care of the

following. i) Increasing the volume of the

training data set: with the increase of the

training data set we will encounter more mixed

verbs and prepare demand frames for each of

them. As a result the probability of facing

unknown verbs will decrease producing better

result. ii) Introducing ‘ha’ verb: when the

parser encounters a sentence without verb, we

introduce ‘ha’ (to be) verb in the sentence

because in Bangla the ‘ha’ verb in its present

indefinite form is omitted in most of the cases.

Although we cannot assign any gnp

information to the ha ‘verb’, so a little

difficulty remains in parsing. iii) When the

parser attaches a level to a dependency

relation, maximum difficulty arises for the

groups (k7p, k7t, k7) and (k2, k2g, k2p)

because all Karakas of a group take the same

Vibhaktis. When we use a common Karaka

Level for a group e.g. k7 for (k7p, k7t, k7) and

k2 for (k2, k2g, k2p) the level attachment

accuracy increases remarkably.

9 Conclusion

An approach for dependency parsing in Bangla

is presented in this paper. Though such type of

parsing has been attempted before for some

Indian languages (Bharati et. al. 2008a, 2008b,

2009a, 2009b, Husain et. al. 2009), extensive

study in Bangla was not addressed before.

However, the approach presented here is

supposed to work well for all morphologically

rich free word order languages like many of

the Indian languages. One major requirement

is to use the lexical resources (demand frames)

for the corresponding language. We must try it

for other ILs in near future. As this approach is

fully grammar driven, more grammatical rules

should have been applied for better accuracy.

In the next stage we plan to realize a statistical

approach after getting sufficient annotated data

and then integrate the positive aspects of the

two approaches to produce better performance.

10 References

A. Bharati, S. Husain, M. Vijay, K. Deepak, D. M.

Sharma and R. Sangal, 2009a. Constraint Based

Hybrid Approach to Parsing Indian Languages. In

Proceedings of the 23rd Pacific Asia Conference on

Language, Information and Computation (PACLIC

23), Hong Kong. 2009.

A. Bharati, S. Husain, D. M. Sharma and R. Sangal,

2009b. Two stage constraint based hybrid approach

to free word order language dependency parsing. In

Proceedings of the 11th International Conference

on Parsing Technologies (IWPT09), Paris. 2009.

A. Bharati, S. Husain, B. Ambati, S. Jain, D. M.

Sharma and R. Sangal, 2008a. Two semantic

features make all the difference in Parsing accuracy.

In Proceedings of the 6th International Conference

on Natural Language Processing (ICON-08),

CDAC Pune, India. 2008.

A. Bharati, S. Husain, D. M. Sharma, and R.

Sangal, 2008b. A Two-Stage Constraint Based

Dependency Parser for Free Word Order

Languages. In Proceedings of the COLIPS

International Conference on Asian Language

Processing 2008 (IALP), Chiang Mai, Thailand.

A. Bharati, V. Chaitanya and R. Sangal, 1995.

Natural Language Processing: A Paninian

Perspective, Prentice-Hall of India, New Delhi.

A. Bharati and R. Sangal, 1993. Parsing Free Word

Order Languages in the Paninian Framework. Proc.

of ACL.

J. Nivre, J. Hall, S. Kubler, R. McDonald, J.

Nilsson, S. Riedel and D. Yuret, 2007. The CoNLL

2007 Shared Task on Dependency Parsing. In

Proceedings of the CoNLL Shared Task Session of

EMNLP-CoNLL 2007.

J. Nivre, 2005. Dependency Grammar and

Dependency Parsing. MSI report 05133. Växjö

University: School of Mathematics and Systems

Engineering.

J. Nivre, J. Hall and J. Nilsson, 2004. Memory-based

dependency parsing. In Proc. of the Eighth Conf. on

Computational Natural Language Learning (CoNLL),

pages 49–56.

K. Bhattacharya, 1993. Bengali-Oriya Verb

Morphology:A Contrastive Study, Das Gupta & Co.

Private Limited, Kolkata.

R. Begum, S. Husain, D. Sharma, and L. Bai,

2008a. Developing Verb Frames for Hindi, In Proc

LREC 2008.

R. Begum, S. Husain, A. Dhwaj, D. Sharma, L. Bai,

and R. Sangal, 2008b. Dependency annotation

scheme for Indian languages. In Proc IJCNLP.

S. De, A. Dhar, and U. Garain, 2009, Karaka

Frames and Their Transformations for Bangla

Verbs, In 31st All-India Conference of Linguists-

2009 (to appear).

S. Husain, P. Gadde, B. Ambati, D. M. Sharma and

R. Sangal, 2009. A modular cascaded approach to

complete parsing. In Proceedings of the COLIPS

International Conference on Asian Language

Processing 2009 (IALP), Singapore. 2009.

30

Appendix A. Dependency Tag-set

k1 karta (doer/agent/subject)

k2 karma (object/patient)

k3 karana (instrument)

k4 sampradaana (recipient)

k5 apaadaana (source)

k7t kaalaadhikarana (location in time)

k7p deshadhikarana (location in space)

k7 vishayaadhikarana (location elsewhere)

ras upapada__ sahakaarakatwa (associative)

rd prati upapada (direction)

rh hetu (cause-effect)

k*u saadrishya (similarity)

rt taadarthya (purpose)

k1s vidheya karta (karta samanadhikarana)

k2s vidheya karma (karma samanadhikarana)

r6 shashthi (possessive)

pk1 prayojaka karta (causer)

mk1 madhyastha karta (causer2)

jk1 prayojya karta (causee)

nmod Noun modifiers

jjmod Adjectival modifiers

adv kriyaavisheshana ('manner adverbs' only)

rad Address words

ccof Conjunct of relation

pof Part of relation

nmod_relc Noun modifier of the type relative clause

jjmod_relc Adjectival modifier of the type relative clause.

rbmod_relc Adverbial modifier of the type relative clause

31

Experiments in Indian Language Dependency Parsing

Bharat Ram Ambati, Phani Gadde and Karan JindalLanguage Technologies Research Centre,

International Institute of Information Technology,Hyderabad, India – 500032

{ambati, phani.gadde}@research.iiit.ac.in , [email protected]

Abstract

In this paper we present our experiments in parsing three Indian Languages namely Hindi, Telugu and Bangla. We explore two data-driven parsers Malt and MST and compare the results of both the parsers. We describe the data and parser settings used in detail. Some of these are specific to either one particular or all the Indian Languages. We report our results on the test data of ICON tools contest. The average of best unlabeled attachment, labeled attachment and labeled accuracies are 88.43%, 71.71% and 73.81% respec-tively.

1 Introduction

Parsing is one of the major tasks which helps in understanding the natural language. It is useful in several natural language applications. Machine translation, anaphora resolution, word sense dis-ambiguation, question answering, summarization are few of them. This led to the development of grammar-driven, data-driven and hybrid parsers. Due to the availability of annotated corpora in re-cent years, data driven parsing has achieved con-siderable success. The availability of phrase structure treebanks for English has seen the de-velopment of many efficient parsers.

Indian Languages are morphologically rich free word order languages. It has been suggested that free word order languages can be handled better using the dependency based framework than the constituency based one (Hudson, 1984; Shieber, 1985; Mel’čuk, 1988, Bharati et al., 1995). As a result, dependency annotation using paninian framework is started for Indian Lan-guages (Begum et al., 2008). There have been some previous attempts at parsing Hindi follow-ing a constraint based approach (Bharati et al.

1993, 2002, 2008b). Due to availability of tree-bank for Hindi, some attempts are made at build-ing statistical (Bharati et al., 2008a, Husain et al., 2009; Ambati et al., 2009) and hybrid parsers (Bharati et al., 2009). In all these approaches both syntactic and semantic cues are explored to reduce the confusion between ambiguous depen-dency tags.

In this paper we describe our experiments in parsing three Indian languages namely Hindi, Telugu and Bangla, in detail. Some of these are specific to either one particular or all the Indian Languages and some in general to any kind of language. We explore two data-driven parsers Malt and MST and compare results of both the parsers. We report our results on the ICON tools contest test data. The average of best unlabeled attachment, labeled attachment and labeled accu-racies are 88.43%, 71.71% and 73.81% respec-tively.

The paper is arranged as follows, in section 2, we present general information about the contest data. In section 3, we describe our approach for parsing. Section 4 describes the data and parser settings for all the three languages. We present our results in section 5. We conclude our paper in section 6.

2 Tools Contest

In ICON tools contest 2009, we are provided with treebanks of three Indian languages namely Hindi, Telugu and Bangla1. Hindi and Bangla are Indo-Aryan languages and Telugu is a dravidian

1 Hindi, Telugu, Bangla are official languages of In-dia. These are fourth, seventh and fourteenth widely spoken languages in the world respectively. For com-plete details, see http://en.wikipedia.org/wiki/Ethnologue_list_of_most-spoken_languages

32

http://en.wikipedia.org/wiki/Ethnologue_list_of_most-spoken_languages



mailto:[email protected]

mailto:phani.gadde%[email protected]

mailto:[email protected]

language which is agglutinative in nature. Table 1, shows the general statistics of the data.

Hindi Telugu Bangla

Sentences 1800 1756 1279

Words 25420 9482 12183

Unique Words 5465 3201 4283

Chunks 16185 6753 8221

Table 1. Statistics of the data.

In the contest we are provided two types of data for all the three languages. Both the data dif-fer only in the tagset. One is fined grained and the other is coarse grained dependency tagset. There are around 30 tags for telugu and bangla and 50 tags for hindi in the fine grained tagset, whereas in coarse grained tagset there are 12 tags for all the three languages.

3 Approach

We used two data driven parsers Malt2 (Nivre et al., 2007a), and MST3 (McDonald et al., 2005b) for our experiments.

Malt is a classifier based Shift/Reduce parser. It uses arc-eager, arc-standard, covington projec-tive and convington non-projective algorithms for parsing (Nivre, 2006). History-based feature models are used for predicting the next parser ac-tion (Black et al., 1992). Support vector ma-chines are used for mapping histories to parser actions (Kudo and Matsumoto, 2002). It uses graph transformation to handle non-projective trees (Nivre and Nilsson, 2005).

MST uses Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds, 1967) Maximum Spanning Tree algorithm for non-projective parsing and Eisner's algorithm for projective parsing (Eisner, 1996). It uses online large margin learning as the learning algorithm (McDonald et al., 2005a).

Malt provides an xml file where we can spec-ify the features for the parser. But for MST, these features are hard coded. Accuracy of the labeler of MST is very low. We tried to modify the code but couldn't get better results. So, we used maxi-mum entropy classification algorithm, maxent4

for labeling. First we ran MST for unlabeled de-pendency tree. On the output of MST we used

2 Malt Version 1.23 MST Version 0.4b4 http://homepages.inf.ed.ac.uk/lzhang10/maxent_tool-kit.html

maximum entropy algorithm for labeling (Dai et al., 2009).

4 Settings

4.1 Input Data

Both the parsers take CoNLL format as input. So, we have taken data in CoNLL format for our experiments. The FEATS column of each node in the data has 6 fields. These are six morpho-logical features namely category, gender, num-ber, person, vibhakti5 or TAM6 markers of the node. We experimented considering different combinations of these fields for both the parsers. For all the three languages vibhakti and TAM fields gave better results than others. This is sim-ilar to the settings of Bharati et al. (2008a). They showed that for Hindi, vibhakti and TAM mark-ers help in dependency parsing where as gender, number, person markers won't.

4.2 Malt Settings

Malt provides options for four parsing algorithms arc-eager, arc-standard, covington projective, covington non-projective. We experimented with all the algorithms for all the three languages for both the tagsets.

Tuning the SVM model was difficult; we tried various parameters but could not find any fixed pattern. Finally, we tested the performance by adapting the CoNLL shared task 2007 (2007b) settings used by the same parser for various lan-guages (Hall et. al, 2007). For feature model also after exploring general useful features, we exper-imented taking different combinations of the set-tings used in CoNLL shared task 2007 for vari-ous languages. Table 2, shows the best settings for all the three languages for both the tagsets.

Algorithm SVM Settings

Hindi - fine arc-eager Turkish settings

Telugu - fine arc-eager Portuguese settings

Bangla - fine arc-eager Portuguese settings

Hindi - coarse arc-standard English settings

Telugu - coarse arc-eager Czech settings

Bangla - coarse arc-eager Czech settings

Table 2. Malt Settings

5 Vibhakti is a generic term for preposition, post-position and suffix.6 TAM is Tense, Aspect and Modality marker.

33

4.3 MST Settings

MST provides options for two algorithms, pro-jective and non-projective. It also provides op-tions to select features over single edge (order=1) and over pairs of adjacent edges in the tree (order=2). We can also specify k-best parses while training using training-k attribute. Table 3, shows the best settings for all the three languages for both the tagsets.

algorithm training-k order

Hindi(coarse, fine)

non-projective 5 2

Telugu(coarse, fine)

non-projective 1 1

Bangla(coarse, fine)

non-projective 5 2

Table 3. MST Settings

With the original MST parser, the labeled ac-curacy is very low. This is because only minimal features are used for labeling. Features consider-ing FEATS column of CoNLL format are not used. Vibhakti and TAM markers which are cru-cial in labeling are not considered by the parser as they are present in FEATS column. We modi-fied the code to so that vibhakti and TAM mark-ers which are cruicial for labeling are used (Bharati et al., 2008a). We tried to add much richer context features like sibling features, but modification of the code is little complex. For this we have to modify the entire data structure being used for labeling task. Because of this we used maxent for labeling.

4.4 Maximum Entropy Settings

Unlabeled dependency tree given by MST is given to maxent for labeling. We did several ex-periments with different options provided by the maxent tool. Best results are obtained when num-ber of iterations is 50.

As we have complete tree information we have taken edge features as well as context fea-tures. Details of the nodes and the features exper-imented are

Nodes• CN: Current node• PN: Parent node• RLS: Right-most left sibling• LRS: Left-most right sibling• CH: Children

Features• W: Lexical item• R: Root form of the word• P: Part-of-speech tag• CP: Coarse POS tag• VT: Vibhakti or TAM markers• D: Direction of the dependency arc• SC: Number of siblings• CC: Number of children• DS: Difference in positions of node and

its parent.• PL: POS list from dependent to tree’s

root through the dependency path.

Table 4, shows the best settings for all the three languages individually.

Features

Hindi (fine, coarse)

D, SC, DS, CCCN: W,R,P,CP,VT,R+PPN: R,P,CP,VTRLS: R,CP,VTLRS: R,CP,VTCH: R,VT

Telugu (fine, coarse)

D, SC, DSCN: W,R,P,CP,VT,R+PPN: R,P,CP,VTRLS: R,CP,VTLRS: R,CP,VTCH: R,CP,VT

Bangla (fine, coarse)

D, SC, DSCN: W,R,P,CP,VTPN: R,P,CP,VTRLS: R,CP,VTLRS: R,CP,VT

Table 4. maxent Settings (CN: W; represents lex-ical item (W) of the current node (CN))

4.5 Some general Hindi specific features

For Hindi we also explored using TAM classes instead of TAM markers (Bharati et al., 2008a). TAM markers which behave similarly are grouped into a class. This reduced the number of features and training time but there isn't any sig-nificant improvement in the accuracies. Genitive vibhakti markers (kA, ke, kI) are the major clue for ‘r6’ relation. But in case of pronouns the vib-hakti feature doesn’t provide this information. But the category given by morph will have a spe-cial value ‘sh_P’ for such cases. Using this infor-

34

mation and suffix of the pronoun, we gave kA, ke or kI as vibhakti feature accordingly.

4.6 Clause Boundary Information as fea-ture for Hindi

4.6.1 Why Clause Boundary?

Traditionally, a clause is defined as a phrase con-taining at least a verb and a subject. It can be an independent clause or a dependent clause, based on whether it can stand alone when taken in iso-lation or not respectively. By the definition itself, the words inside a clause form a set of modifier-modified relations, thereby forming a meaningful unit, like a sentence. This makes most of the de-pendents of the words in a clause to be the words in the same clause, or we can say that the depen-dencies of the words in a clause are localized to the clause boundary.

Given a sentence, the parser has to disam-biguate between several words in the sentence, to find the parent of a particular word. Making the parser use the clause boundary information en-ables the parser to localize its decisions to some extent, which will result in a better performance. The search space of the parser can be reduced by a good extent if we solve a relatively small prob-lem of identifying the clauses. This is the whole idea behind partial parsing, where we sacrifice completeness for efficiency and still get valuable information about the sentence.

Interestingly, it has been shown recently that, most of the non-projective cases in Hindi are in-ter-clausal (Mannem et al., 2009). Identifying clausal boundaries, therefore, should prove help-ful in parsing non-projective structures. The same holds true for long-distance dependencies.

4.6.2 Clause Boundary Identifier

We used the Stage1 parser of Husain et. al, 2009, for the clause boundary information. We used this partial parser, as a pre-processing step, to get the clause boundary information. It uses Malt to identify just the intra-clausal relations. To achieve this, they introduce a special dummy node named _ROOT_ which becomes the head of the sentence. All the clauses are connected to this dummy node with a dummy relation. In ef-fect the parser gives only intra-clausal relations. We did extensive experiments on Hindi, trying different definitions of clauses, to get an optimal clause definition, to do the partial parsing. Since some of these experiments need to be analyzed in more detail in case of Bangla and Telugu, we

limited ourselves to use the clause boundary in-formation in Hindi parsing.

Since the above tool parses all the clauses, we have the information about the structure of the clause too, along with the clause boundary infor-mation. We did several experiments trying to uti-lize all the information that is being given by the partial parser. The best results are when the clause boundary information, along with the head information for each clause is given as a feature to each node. The accuracies of the clausal features are given in the following table.

Precision Recall

Clause Boundary Information

84.83% 91.23%

Head Information 92.42% 99.40%

Table 5. Accuracies of the features being used.

Since the clause inclusion features for each node can't be given either in Malt or in MST, we tried to modify the parsers to make them handle these kind of features. The best results using clause boundary information are those when a modified version of MST is used for parsing. We first experimented by giving only the clause in-clusion (boundary) information to each node. This can help the parser reduce its search space in parsing decisions. Then, we provided the head and non-head information (whether that node is the head of the clause or not) as well. The head or non-head information helps in handling com-plex sentences in which there are more clauses and each verb in the sentence having its own ar-gument structure. The best accuracy for MST-parser, that is reported in this paper is when both of these features are used.

For Hindi, we used clause boundary feature for unlabeling. This output is given to maxent for labeling.

5 Experiments and Results

We merged both the training and development data and did 5-fold cross-validation for tuning the parsers. We extracted best settings from the cross validation experiments. These settings are applied on the test data of the contest. Size of the test data is 150 sentences for each of the three languages. Table 5, shows the individual and av-erage results on both coarse-grained and fine-grained tagset for all the three languages' test data using both the approaches.

35

Hindi-fine

Telugu-fine

Bangla-fine

Average-fine

Hindi-coarse

Telugu-coarse

Bangla-coarse

Average-coarse

Average-total

Malt UAS 90.14 86.28 88.45 88.29 90.22 85.25 90.22 88.56 88.43

LAS 74.48 60.55 72.63 69.22 79.33 65.01 78.25 74.2 71.71

L 76.38 61.58 75.34 71.1 81.66 66.21 81.69 76.52 73.81

MST +maxent

UAS 91.26 84.56 87.51 87.78 90.48 85.42 87.41 87.77 87.77

LAS 73.79 57.12 69.61 66.84 79.15 63.12 75.03 72.43 69.64

L 76.04 58.15 73.15 69.11 81.75 64.67 79.19 75.2 72.16

Table 6. Results on Test Data

6 Conclusions and Future Directions

For all the languages Malt performed better over MST+maxent. We would like to do proper error analysis after the release of gold test data.

We have modified the implementation of MST to handle vibhakti and TAM markers for label-ing. We observed that even during unlabeled parsing some features which might not be useful in parsing are being used. We would like to mod-ify the implementation to do experiments with features for unlabeling also.

AcknowledgmentsWe would like to express our gratitude to Dr. Dipti Misra Sharma and Prof. Rajeev Sangal for their guidance and support. We would also like to thank Mr. Samar Husain for his valu-able suggestions. Mr. Sambhav Jain had worked on a series of initial experiments on hindi parsing.

References

B. R. Ambati, P. Gade and C. GSK. 2009. Effect of Minimal Semantics on Dependency Parsing. In Proceedings of RANLP 2009 Student Research Workshop.

R. Begum, S. Husain, A. Dhwaj, D. M. Sharma, L. Bai, and R. Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of IJCNLP-2008. http://www.iiit.net/techreports/2007_78.pdf

A. Bharati, S. Husain, B. Ambati, S. Jain, D. Sharma, and R. Sangal. 2008a. Two semantic features make all the difference in parsing accuracy. In Pro-ceedings of ICON-08.

A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2008b. A Two-Stage Constraint Based Depen-

dency Parser for Free Word Order Languages. In Proceedings of the COLIPS International Confer-ence on Asian Language Processing 2008 (IALP). Chiang Mai, Thailand. 2008.

A. Bharati, S. Husain, D. M. Sharma, and R. Sangal. 2009. Two stage constraint based hybrid approach to free word order language dependency parsing. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT09). Paris. 2009.

A. Bharati, R. Sangal, T. P. Reddy. 2002. A Con-straint Based Parser Using Integer Programming, In Proceedings of ICON, 2002. www.iiit.net/techreports/2002_3.pdf

A. Bharati, V. Chaitanya and R. Sangal. 1995. Natu-ral Language Processing: A Paninian Perspective, Prentice-Hall of India, New Delhi, pp. 65-106. ltr-c.iiit.ac.in/downloads/nlpbook/nlp-panini.pdf

A. Bharati and R. Sangal. 1993. Parsing Free Word Order Languages in the Paninian Framework. Proc. of ACL:93.

E. Black, F. Jelinek, J. D. Lafferty, D.M.Magerman, R. L.Mercer, and S. Roukos. 1992. Towards his-tory-based grammars: Using richer models for probabilistic parsing. In Proc. of the 5th DARPA Speech and Natural Language Workshop, pages 31–37.

Y.J. Chu and T.H. Liu. 1965. On the shortest arbores-cence of a directed graph. Science Sinica, 14:1396–1400.

Q. Dai, E. Chen, and L. Shi. 2009. An iterative ap-proach for joint dependency parsing and semantic role labeling. In Proceedings of the 13th Confernce on Computational Natural Language Learn-ing(CoNLL-2009), June 4-5, Boulder, Colorado, USA.June 4-5.

J. Edmonds. 1967. Optimum branchings. Journal of Research of the National Bureau of Standards, 71B:233–240.

36

http://www.iiit.net/techreports/2007_78.pdf

J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceed-ings of COLING-96, pages 340–345.

J. Hall, J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson and M. Saers. 2007. Single Malt or Blended? A Study in Multilingual Parser Optimiza-tion. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, 933—939.

S. Husain, P. Gadde, B. Ambati, D. M. Sharma and Rajeev Sangal. 2009. A modular cascaded ap-proach to complete parsing. In the Proceedings of COLIPS International Conference on Asian Lan-guage Processing(IALP), 2009.

R. Hudson. 1984. Word Grammar, Basil Blackwell, 108 Cowley Rd, Oxford, OX4 1JF, England.

T. Kudo and Y. Matsumoto. 2002. Japanese depen-dency analysis using cascaded chunking. In CoNLL-2002. pp. 63–69.

P. Mannem and H. Chaudhry. 2009. Insights into Non-projectivity in Hindi. In Proceedings of ACL-IJCNLP Student paper workshop.

R. McDonald, K. Crammer, and F. Pereira. 2005a. Online large-margin training of dependency parsers. In Proceedings of ACL 2005. pp. 91–98.

R. McDonald, F. Pereira, K. Ribarov, and J. Hajic. 2005b. Non-projective dependency parsing using spanning tree algorithms. Proceedings of HLT/EMNLP, pp. 523–530.

I. A. Mel'čuk. 1988. Dependency Syntax: Theory and Practice, State University, Press of New York.

J. Nivre and R. McDonald. 2008. Integrating Graph-Based and Transition-Based Dependency Parsers. In Proc. Of ACL-2008.

J. Nivre, J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007a. Malt-Parser: A language-independent system for data-driven dependency parsing. Natural Language En-gineering, 13(2), 95-135.

J. Nivre and J. Hall and S. Kubler and R. McDonald and J. Nilsson and S. Riedel and D. Yuret. 2007b. The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007.

J. Nivre. 2006. Inductive Dependency Parsing. Springer.

J. Nivre and J. Nilsson. 2005. Pseudo-projective de-pendency parsing. In Proc. of ACL-2005, pages 99–106.

S. M. Shieber. 1985. Evidence against the context-freeness of natural language. In Linguistics and Philosophy, p. 8, 334–343.

37

Proceedings of ICON-2009: 7th International Conference on Natural Language Processing,

Macmillan Publishers, India. Also accessible from http://ltrc.iiit.ac.in/proceedings/ICON-2009

Grammar Driven Rules for Hybrid Bengali Dependency Parsing

Abstract

This paper describes a hybrid approach

for parsing Bengali sentences based on

the dependency tagset and Treebank re-

leased in ICON 2009 tool contest. A data

driven dependency parser is considered

as a baseline system. Some handcrafted

rules are identified based on the error pat-

terns on the output of the baseline sys-

tem.

1 Introduction

Parsing is a method of analyzing the gram-

matical structure of a sentence. Dependency

parsing indicates the dependency relations be-

tween the words in a parse tree. Data driven and

grammar driven approaches have been used for

parsing a sentence. Data driven parsers need

large amount of manually annotated parsed data

which is called a Treebank. The availability of

such data for Bengali is limited. A data driven

parser which was trained on the Bengali Tree-

bank data released by ICON tool contest, 2009

was found to have limited accuracy. On the other

hand, most of the modern grammar-driven de-

pendency parsers (Karlsson et al.; 1995, Bharati

et al.; (1993, 2002, 2008)) parse by eliminating

the parses which do not satisfy the given set of

constraints. They require rules to be developed

for each layer. In the work of Akshar Bharati et

al. (PACLIC, 2009; IWPT09, 2009), the gram-

mar driven approach is complemented by a con-

trolled statistical strategy to achieve high per-

formance and robustness.

Developing a grammar driven parser requires

a deep knowledge of the dependency relations of

the phrases. The creation of rules to disambigu-

ate the relations is very challenging task. We use

a hybrid approach for shallow level dependency

parsing of Bengali texts where the output of the

data driven parser is postprocessed by applying

certain rules to improve the parser accuracy.

Bengali is the seventh widely spoken language in

the world (Ethnologue, 2009). The work done on

parsing Bengali text is quite limited. In this pa-

per, we discuss a hybrid system developed for

shallow level dependency parsing of Bengali

sentences. The system uses an openly available

data driven dependency parser MaltParser ver-

sion 1.2 (Nivre et. al., 2006) as a baseline sys-

tem. The Bengali Treebank released by ICON

tool contest, 2009 is used for training. We have

suggested few rules based on the error patterns of

the MaltParser output. The output of the baseline

system is postprocessed based on these rules to

improve the accuracy of the system. These rules

are not complete in handling all the error pat-

terns. More study is required to prepare a com-

prehensive rule set to further improve the results.

2 System Architecture

Figure 1 shows the system architecture of our

system. The freely available MaltParser version

1.2 is used as a baseline system. The MaltParser

Sanjay Chatterji Computer Science and Engineering

IIT Kharagpur

Kharagpur, W.B., India

[email protected]

Praveen Sonare Computer Science and Engineering

IIT Kharagpur


[email protected]

Sudeshna Sarkar Computer Science and Engineering

IIT Kharagpur


[email protected]

Devshri Roy Computer and Informatics Centre

IIT Kharagpur


[email protected]

38

is trained with the training data released by the

ICON tool contest 2009. The trained parser is

tested with the development data given by ICON

tool contest 2009. The data is tagged based on

the dependency relations given in

AnnCorra(Sharma et. al.) tagset. We have done

the postprocessing part in 4 stages. The post-

processing stages are also carried out on the out-

put of the MaltParser on the development data.

These four stages are explained in section 4. We

have evaluated the output in each stage using the

official CoNLL-07 shared task evaluation script

eval07.pl.

Figure 1: System Architecture

3 Confusion Matrix

As discussed in Section 2, the MaltParser is

trained and tested with the given data set. The

base line score obtained is 54.62% for labeled

attachment, 79.28% for unlabeled attachment.

We examined the confusion matrix and identified

the tags for which the most errors have been

made. Then we worked out postprocessing rules

to correct some of the errors. The original confu-

sion matrix is a 28×28 matrix. A portion of the

confusion matrix is given in Table 1. The tag

nomenclature meanings are as follows: k1 –

Karta ( doer/agent/subject), k2 – Karma (object),

k7p – Deshadhikarana (location in space), r6 –

Shashthi (possessive), pof – Part of relation, k7t

– kaalaadhikarana (location in time), nmod_relc–

Noun modifier of the type relative clause, rt–

Taadarthya(Purpose), vmod–Verb Modifier, k7–

Vishayaadhikarana(Location elsewhere) and k5–

Apaadaana(Source).

k1

(Ide

nti-

fied)

k2

(Ide

nti-

fied)

k7p

(Ide

nti-

fied)

r6

(Ide

nti-

fied)

pof

(Ide

nti-

fied)

k1 (True) 92 23 4 9 14

k2 (True) 10 47 0 0 14

k7p (True) 12 5 11 2 7

r6 (True) 14 5 2 29 0

pof (True) 1 4 0 0 18

Table1: A portion of Confusion Matrix

In the confusion matrix, the rows represent the

true tags and the columns represent the tags iden-

tified by the parser. For example, the true tag k1

appears as 92 times k1, 23 times k2, 4 times k7p,

9 times r6 and 14 times pof in the MaltParser

output data. The diagonal elements represent cor-

rectly identified tags while the non-diagonal ele-

ments represent wrongly identified tags. Note the

pof(Identified) column of the matrix. Among all

the pof MaltParser has identified 18 pof cor-

rectly. 14 k1 and 14 k2 are wrongly tagged as

pof. 7 k7p are also wrongly tagged as pof.

In order to improve the accuracy we try to mini-

mize the non-diagonal figures in the matrix. We

observed that in many cases, tags “k1” and “k2”

and “k7p” were identified as “pof” and in some

cases, they were identified as “r6”. We also ob-

served that in many cases, the parser did not

identify the root of the sentence correctly.

To minimize the non-diagonal figures of the ma-

trix, the parsed output data is observed carefully

to discover different rules and the MaltParser

output data is postprocessed.

4 Postprocessing

The output of the MaltParser is postprocessed in

four steps to improve the quality of the parsed

output. In the first step, we tried to identify the

phrases from the sentences that are the root of

the parse tree. We have discussed a TAM marker

based approach in Section 4.1. The TAM marker

of a verb phrase is a concatenation of the suffix

and postpositions which indicate the tense, as-

pect and modality of the verb phrase. It does not

indicate the person or any other feature. For ex-

ample balalAma, balala and balalena have the

same tense(past), aspect(simple) and modal-

ity(indicative). So they have same TAM marker

39

la. In the next step, we tried to identify the pos-

sessive relations of a sentence. In the tag set pos-

sessive relations are denoted by “r6”. To identify

the “r6”, we have used rule based approach. We

observed that most of the errors occur due to the

wrong identification of “k1” and “k2” as “pof”

relation. We resolved some conflicts of “pof”

relations with k1 and k2 in step 3. Finally we

postprocessed the noun chunks based on their

postposisions and suffix markers in step 4.

4.1 TAM based root identification

We observed that in many cases the parser is not

able to identify the root of the sentence correctly.

To identify the root of the sentences, we looked

into sentences and tried to find out some clues.

We observed that when a verb occurs with some

TAM marker it becomes the root of the sentence.

Consider the Bengali Sentence “sirila balala

caluna” which means “Siril said lets go”. There

are two verb chunks, balala and caluna. The cor-

rect root is identified by some rules.

By observing different sentences, we detected

some TAMs of the verb phrases which uniquely

identify them as root of the sentence. We pre-

pared a priority based TAM list that contains 150

such TAMs. For each simple sentence, we find

the TAM attached with the verb. If the TAM oc-

curs in the prepared TAM list, we assign that

verb as the root of the sentence. In a complex

sentence more than one phrase with different

TAMs can occur. Among these phrases, one will

be the root of the sentence. These TAMs are ar-

ranged in the priority based TAM list in order of

their priority. If two entries of the TAM list co-

occur in a sentence then the higher priority TAM

will become the root of the sentence.

In the above example the verb chunks balala and

caluna have the TAMs la and ka respectively.

caluna is in third formal person. caluka is in sec-

ond person which have the same tense, aspect

and modality as caluna. So caluka and caluna

have the same TAM marker ka. The MaltParser

identifies caluna as the root of the parse tree and

the phrase balala as verb modifier. In the priority

based TAM list la comes before ka. So, we

changed the phrase balala as the root of the parse

tree.

There are some more TAMs not listed in the pri-

ority list. Those TAMs can not uniquely identify

the root of the tree. More study is required to use

them for identification of the root of the sen-

tences.

4.2 Genitive Marker Based Possessive Rela-

tion Identification

According to the dependency tag set, r6 indicates

possessive relation. It takes ra, era, xera etc.

genitive markers. But in many cases, karta (k1)

takes the same markers. To differentiate the r6

from k1 in a sentence, we look into the case of

the word along with the genitive markers. If the

case of a word with genitive marker is oblique,

then only that word is identified as posses-

sive(r6).

Rule: if the case of a word with genitive or pos-

sessive marker is oblique

assign r6 for that word and attach it

with the next noun or NST chunk..

For example, rAmera Cele bAdZi jAbe. (Ram’s

son will go to home). In this example, the case of

the rAmera is oblique and so it is identified as

possessive(r6). If the case of the word is direct,

we do not consider that word as possessive(r6).

In the example, rAmera Kixe peyZeCe (rAma is

hungry), rAmera is not r6. Here rAmera is

karta(k1).

4.3 Resolving “pof” misidentification

“pof” (part of) dependency relation does not take

any suffix. We selected the chunks which are

identified as “pof” with a verb chunk having suf-

fix or postpositions. We changed the correspond-

ing relation to some other relation based on the

following rules.

Rule1: if the suffix is yZa or we or e

assign k7p for that word

Rule2: if the suffix is ke

assign k2 for that word

Rule3: if the suffix is tA


Table 2 shows some of the cases where Rule 1 is

applied. MaltParser has identified the noun

phrase entries of this table as pof. If the suffix is

e, we or yZa then the noun phrase dependency

tag is identified as k7p according to the Rule 1.

The postposition or TAM is not used in this rule

but can be used in some other rules. In this step

of postprocessing we have not modified the at-

tachments and so is not included in this table.

40

Table2: List of some phrases for Rule 1.

4.4 postposition and suffix marker based

rules for dependency relation identifica-

tion

For morphologically rich free word order lan-

guages the postposition, suffix and TAM Mark-

ers are very important to identify the proper de-

pendency relations (Bharati et. al., 2008). The

suffix and postpositions which identifies a par-

ticular relation is identified in this phase of post-

processing. In many cases a suffix or postposi-

tion is attached with a particular relation, even

though there are some exceptions. We made

many rules for identification of dependency rela-

tion based on the suffixes and postpositions.

Some of the rules are written below.

Rule1: if the postposition is CAdZA

assign vmod for that word

Rule2: if the suffix is kAle

assign k7p for that word

Rule3: if the postposition is janya

assign rt for that word

Rule4: if the postposition is xiyZe

AND the POS tag is NN

assign vmod for that word

Rule5: if the postposition is xiyZe

AND the POS tag is NST


Rule6: if the postposition is Weke


5 Evaluation

Training data size: 980 Sentences

Test data size: 150 Sentences

Tagset: As given in AnnCorra (Sharma et. al.)

Baseline System: MaltParser version 1.2

Evaluation Module: eval07.pl

Input Bengali Sentence: nijera|PRP[c-o|v-era]

klAsera|NN[c-o|v-era] (sAmane xiyZe)|NST[c-

d|v-0_xiyZe] yaKana|PRP[c-d|v-0] se|PRP[c-d|v-

0] (neme AsaCe)|VM[c-$|v-Be_As+Ce] wa-

Kana|PRP[c-d|v-0] GatanAtA|NN[c-$|v-tA] Ga-

tala|VM[c-$|v-la].

English Translation: Near his own class when he

was getting down then the incident take place.

In the above example Bengali input sentence the

chunks with more than one words are shown in

round bracket. The POS tag of the head of each

chunk is separated by pipe (|). The features

namely case(c) and vibhakti or TAM(v) of the

chunk which are used in postprocessing are

shown near each chunk in square bracket. Here

PRP – Pronoun, NN – Noun, NST – Space or

time, VM – Verb. Following are the actual de-

pendency relations between the chunks of the

input sentence.

r6(nijera, klAsera)

r6(klAsera, sAmane xiyZe)

k7(sAmane xiyZe, neme AsaCe)

k7t(yaKana, neme AsaCe)

k1(se, neme AsaCe)

nmod_relc(neme AsaCe, waKana)

k7t(waKana, Gatala)

k1(GatanAtA,Gatala)

root(Gatala)

The dependency relations of the chunks of the

above example as given by the MaltParser ver-

sion 1.2 are shown below. Wrong relations and

incorrect attachments/arguments are made bold.

r6(nijera, klAsera)

k1(klAsera, neme AsaCe) k7p(sAmane xiyZe, neme AsaCe)


k1(se, neme AsaCe)

root(neme AsaCe) k7t(waKana, Gatala)

pof(GatanAtA,Gatala)

k2(Gatala, neme AsaCe)

In the above example there are two verb chunks:

neme AsaCe and Gatala. The corresponding

TAMs are Be_As+Ce and la. Out of them la

comes earlier than Be_As+Ce in the priority

based TAM list. So, in the TAM based root iden-

tification phase of postprocessing (explained in

section 4.1), we changed the dependency tag of

the phrase Gatala as the root of the sentence. We

also changed its attachment as 0. In the r6 identi-

fication phase of postprocessing (explained in

section 4.2) we found the phrase klAsera is hav-

ing case – oblique and suffix era. So its depend-

ency tag is changed to r6 and attachment is

changed to the next NST phrase. In the pof re-

Head Word Suffix TAM PO

S

Changed

relation

koNe e me NN k7p

bAWarume e $ NN k7p

mAWAyZa yZa me NN k7p

anuBabe e me NN k7p

e e $ NN k7p

klASarume e $ NN k7p

kaWAyZa yZa me NN k7p

bAdZiwe we $ NN k7p

trene e me NN k7p

41

solving phase of postprocessing (explained in

section 4.3) we found the phrase GatanAtA is

made pof. It has the suffix tA. So, Rule 3 is ap-

plied and its tag is changed to k1. In the postpo-

sition and suffix based postprocessing (explained

in section 4.4) we found the phrase with the head

sAmane is having the suffix xiyZe and its POS

tag is NST. So, rule 5 is applied and its tag is

changed to k7. The final relations of the above

example after all the four postprocessing are

given below. Here there is one incorrect relation

which is made bold.

r6(nijera, klAsera)

r6(klAsera, sAmane xiyZe)

k7(sAmane xiyZe, neme AsaCe)


k1(se, neme AsaCe)

root(neme AsaCe) k7t(waKana, Gatala)

k1(GatanAtA,Gatala)

root(Gatala)

The improvements of the accuracies by the post-

processing approaches are shown in the Table 3.

Here we can see that the first two postprocessing

phases have improved all the three scores. But

last two postprocessing stages have not improved

unlabeled attachment score. That is due to the

fact that we have not changed the attachment in

these two stages. Only dependency tag values are

changed. So, only labeled scores have improved

in these two stages.

Table3: evaluation results of all stages (Postpro-

postprocessing, att-attachment, acc-accuracy)

6 Conclusion

The postprocessing rules we have suggested can

be applied on the output of any data driven sys-

tem. We have checked MaltParser and

MSTParser and by the postprocessing we have

got highest accuracy in the first one. Both of

these systems can not use all the features

(Bharati et. al., 2008). Our rules are mainly hard

constraints (Bharati et. al., PACLIC, 2009)

(Bharati et. al., IWPT09, 2009) based on the

grammatical features that in general can not be

broken. If a system is built which uses all the

features then the rules may become less effec-

tive.

These rules are not complete in handling all the

error patterns. The rules are built based on the

patterns of errors in MaltParser system output.

Applying these rules on the output of some other

system may not effect. More study is required to

prepare a comprehensive rule set to further im-

prove the results.

Reference

Bharati A., Husain S., Ambati B., Jain S., Sharma D.

M. and Sangal R. Two semantic features make all

the difference in Parsing accuracy. In Proceedings

of the 6th International Conference on Natural

Language Processing (ICON-08), CDAC Pune,

India. 2008.

Bharati A., Husain S., Sharma D. M. and Sangal R.

Two stage constraint based hybrid approach to free

word order language dependency parsing. In Pro-

ceedings of the 11th International Conference on

Parsing Technologies (IWPT09). Paris. 2009.

Bharati A., Husain S., Vijay M., Deepak K, Sharma

D. M. and Sangal R. Constraint Based Hybrid Ap-

proach to Parsing Indian Languages. In Proceed-

ings of The 23rd Pacific Asia Conference on Lan-

guage, Information and Computation (PACLIC

23). Hong Kong. 2009.

Bharati, A., R. Sangal, and T. P. Reddy. A Constraint

Based Parser Using Integer Programming, In Pro-

ceedings of ICON. 2002.

Bharati, A. and R. Sangal. 1993. Parsing Free Word

Order Languages in the Paninian Framework. In

Proceedings of ACL. 1993.

Ethnologue: Languages of the World, 16th edition,

Edited by M. Paul Lewis, 2009.

Karlsson, F., A. Voutilainen, J. Heikkilä and A. Ant-

tila,(eds). Constraint Grammar: A language-

independent system for parsing unrestricted text.

Mouton de Gruyter. 1995.

Nivre J., Hall J. and Nilsson J. MaltParser: A Data-

Driven Parser-Generator for Dependency Parsing.

In Proceedings of the fifth international conference

on Language Resources and Evaluation

(LREC2006), May 24-26, 2006, Genoa, Italy, pp.

2216-2219

Sharma D. M., Sangal R., Bai L. and Begam R.

AnnCorra : TreeBanks for Indian Languages (ver-

sion – 1.9)

Labeled

att.(%)

Unlabele

d att.(%)

Label

acc.(%)

Baseline 54.62 79.28 58.45

Postpro 1 56.72 80.89 60.67

Postpro 2 58.45 82.61 62.52

Postpro 3 59.93 82.61 64.00

Postpro 4 60.79 82.61 65.23

42

Constraint based Hindi dependency parsing

Meher Vijay Yeleti, Kalyan Deepak Language Technologies Research Centre

IIIT-Hyderabad, India. [email protected], [email protected]

Abstract

The paper describes the overall design of a new two stage constraint based hybrid approach to dependency parsing. We de-fine the two stages and show how differ-ent grammatical construct are parsed at appropriate stages. This division leads to selective identification and resolution of specific dependency relations at the two stages. Furthermore, we show how the use of hard constraints and soft con-straints helps us build an efficient and ro-bust hybrid parser. The experiments tried out for soft constraints are elucidated in detail. Finally, we evaluate the imple-mented parser on ICON tools contest Hindi data. Best Labeled and unlabeled attachment accuracies for Hindi are 62.20% and 85.55% respectively.

1 Introduction

Due to the availability of annotated corpora for various languages since the past decade, data driven parsing has proved to be immensely suc-cessful. Unlike English, however, most of the parsers for morphologically rich free word order (MoR-FWO) languages (such as Czech, Turkish, Hindi, etc.) have adopted the dependency gram-matical framework. It is well known that for MoR-FWO languages, dependency framework provides ease of linguistic analysis and is much better suited to account for their various struc-tures (Shieber, 1975; Mel'Cuk, 1988; Bharati et al., 1995). The state of the art parsing accuracy for many MoR-FWO languages is still low com-pared to that of English. Parsing experiments (Nivre et al., 2007; Hall et al., 2007) for these languages have pointed towards various reasons

for this low performance. For Hindi1, (a) difficul-ty in extracting relevant linguistic cues, (b) non-projectivity, (c) lack of explicit cues, (d) long distance dependencies, (e) complex linguistic phenomena, and (f) less corpus size, have been suggested (Bharati et al., 2008) for low perfor-mance. The approach proposed in this paper shows how one can minimize these adverse ef-fects and argues that a hybrid approach can prove to be a better option to parsing such languages. There have been, in the past, many attempts to parsing using constraint based approaches. Some of the constraint based parsers known in the lite-rature are Karlsson et al. (1995), Maruyama (1990), Bharati et al. (1993, 2002), Tapanainen and Järvinen (1998), Schröder (2002), and more recently, Debusmann et al. (2004). Some at-tempts at parsing Hindi using data driven ap-proach have been (Bharati et al., 2008b; Husain et al., 2009). Later in Section 4, we’ll compare the results of data-driven Hindi parsing with that of our approach.

We show how the use of hard constraints (H-constraints) and soft constraints (S-constraints) helps us build an efficient and robust hybrid parser. Specifically, H-constraints incorporate the knowledge base of the language and S-constraints are used as weights that are automati-cally learnt from an annotated treebank. Finally, we evaluate the implemented parser on Hindi and compare the results with that of two data driven dependency parsers.

The paper is arranged as follows: Section 2 describes in detail the proposed approach for parsing free word order languages. Section 3 dis-cusses the types of constraints used. We describe the experiments performed and report the results in Section 4.

1 Hindi is a verb final language with free word order and a rich case marking system. It is one of the official languages of India, and is spoken by ~800 million people.

43

2 Approach

We try to solve the task of dependency parsing using a hybrid approach. A grammar driven ap-proach is complemented by a controlled statistic-al strategy to achieve high performance and ro-bustness. The overall task of dependency parsing is attacked using modularity, wherein specific tasks are broken down into smaller linguistically motivated sub-tasks. Figure 1 above shows the output of each of these sub-tasks.

2.1 Background

Data driven parsing is usually a single stage process wherein a sentence is parsed at one go. Many attempts have, however, tried to divide the overall task into sub-task. One trend has been to first identify dependencies and then add edge labels over them (McDonald et al., 2005, Chen et al., 2007). The other trend has been towards per-forming smaller linguistically relevant tasks as a precursor to complete parsing (Abney, 1997; Bharati et al., 1995; Attardi and Dell’Orletta, 2008; Shiuan and Ann, 1996).

In our approach we divide the task of parsing into the following sub-tasks (layers):

1. POS tagging, chunking (POSCH), 2. Constraint based hybrid parsing (CBHP), 3. Intra-chunk dependencies (IRCH) identi-

fication. (a) POSCH is treated as pre-processing to the task of parsing. A bag represents a set of adjacent words which are in dependency relations with each other, and are connected to the rest of the words by a single incoming dependency arc. Thus a bag is an unexpanded dependency tree connected to the rest only by means of its root. A noun phrase or a noun group chunk is a bag in

which there are no verbs, and vice versa for verb chunks. The relations among the words in a chunk are not marked and hence allow us to ig-nore local details while building the sentence level dependency tree. In general, all the nominal inflections, nominal modifications (adjective modifying a noun, etc.) are treated as part of a noun chunk, similarly, verbal inflections, auxilia-ries are treated as part of the verb chunk (Bharati et al., 2006). (b) CBHP takes the POS tagged and chunked sentence as input and parses it in two stages. The parser makes use of knowledge base of the lan-guage along with syntactico-semantic prefe-rences to arrive at the final parse. Broadly, mod-ularity in CBHP works at two layers (cf. Figure 3): (1) The sentence analysis layer, and (2) The parse selection layer. We discuss this approach to parsing in the following sections. (c) IRCH dependencies are finally identified as a post-processing step to (b) and (c). Once this is done, the chunks can be removed and we can get the complete dependency tree. We will not dis-cuss IRCH in this paper. In the dependency trees (b) and (c) shown in Figure 1, each node is a chunk and the edge represents the relations between the connected nodes labeled with suitable relations2. After re-moving the chunks in (d) each node is a lexical item of the sentence.

2 All the relations marked by the parser are syntactico-semantic labels. For a detailed analysis see Bharati et al. (1995). Many relations shown in the diagrams of this paper are described in Begum et al. (2008a). For the complete tagset description, see http://ltrc.iiit.ac.in/MachineTrans /research/tb/DS-guidelines/DS-guidelines-ver2-28-05-09.pdf

44

Eg. 1: mohana ne tebala para apani kitaaba ’Mohan’ ‘ERG’ ‘table’ ‘on’ ‘his’ ‘book’

rakhii Ora vaha so gayaa ‘kept’ ‘and’ ‘he’ ‘sleep’ ‘PRFT’

‘Mohan placed his book on the table and slept’

From (a) to (d) in Figure 1, outputs of each of the previously discussed layers have been shown. Note that one can use any of these outputs inde-pendently. More importantly, (b) is a partial parse obtained after the 1st stage of CBHP, and (c) is the output after the 2nd stage of CBHP. We’ll elaborate on this in the following sections. To test the performance of the proposed parser we use gold POS tagged and chunked data, instead of using the outputs of POS tagger and chunker.

2.2 Constraint Parsing

Constraint based parsing using integer program-ming has been successfully tried for Indian lan-guages (Bharati et al., 1993; 2002). Under this scheme the parser exploits the syntactic cues present in a sentence and forms constraint graphs (CG) based on the generalizations present. It uses such notions as basic demand frames and trans-formation frames (Bharati et al., 1995) to con-struct the CG. It then translates the CG into an integer programming (IP) problem. The solutions to the problem provide the possible parses for the sentence. We follow the approach used by Bha-rati et al. (1995, 2008a) for formulating the con-straints as IP problem and solving them to get the parses.

2.3 Two Stage Parsing

The proposed parser tries to analyze the given input sentence, which has already been POS tagged and chunked, in 2 stages; it first tries to extract intra-clausal3 dependency relations. These generally correspond to the argument structure of the verb, noun-noun genitive relation, infinitive-verb relation, infinitive-noun relation, adjective-noun, adverb-verb relations, etc. In the 2nd stage it then tries to handle more complex relations such as conjuncts, relative clause, etc. What this essentially means is a 2-stage resolution of de-pendencies, where the parser selectively resolves the dependencies of various lexical heads at their appropriate stage, for example verbs in the 1st

stage and conjuncts and inter-verb relations in the 2nd stage. The key ideas are: (1) There are 3 A clause is a group of word such that the group contains a single finite verb chunk.

two layers (stages), (2) the 1st stage handles intra-clausal relations, and the 2nd stage handles inter-clausal relations, (3) the output of each layer is a linguistically valid partial parse that becomes, if necessary, the input to the next layer, and (4) the output of the final layer is/are the desired full parse(s). These form the sentence analysis layer in the overall design. Figure 3 shows this clearly.

The 1st stage output for example 2 is shown in figure 2 (a).

Eg. 2: mai ghar gayaa kyomki mai ’I’ ’home’ ’went’ ’because’ ’I’ bimaar thaa ’sick’ ‘was’ ‘I went home because I was sick’

In figure 2a, the parsed matrix clause subtree ‘mai ghar gayaa’ and the subordinate clause are attached to _ROOT_. The subordinating conjunct ‘kyomki’ (because) is also seen attached to the _ROOT_. _ROOT_ ensures that the parse we get after each stage is connected and takes all the analyzed 1st stage sub-trees along with unpro-cessed nodes as its children. The dependency tree thus obtained in the 1st stage is partial, but lin-guistically sound. Later in the 2nd stage the rela-tionship between various clauses are identified. The 2nd stage parse for the above sentences is also shown in figure 2b. At the end of 2nd stage, the subordinate conjunct kyomki gets attached to the matrix clause and takes the root of the subor-dinate clause as its child. Similar to example 2, the analysis of example 1 is shown in figure 1. Note that under normal conditions the 2nd stage does not modify the parses obtained from the 1st stage, it only establishes the relations between the clauses. However, sometimes under very strict conditions, repair is possible (Bharati et al., 2008a).

Figure 2: (a): 1st stage output for Eg. 2, (b):

2nd stage final parse for Eg. 2

45

3 Hard and Soft Constraints

Both 1st and 2nd stage described in the previous section use linguistically motivated constraints. These hard constraints (H-constraints) reflect that aspect of the grammar that in general cannot be broken. H-constraints comprise of lexical and structural knowledge of the language. The H-constraints are converted into integer program-ming problem and solved (Bharati et al., 2002, 2008a). The solution(s) is/are valid parse(s). The soft constraints (S-constraints), on the other hand, are learnt as weights from an annotated treebank4. They reflect various preferences that a language has towards various linguistic phenomena. They are used to prioritize the parses and select the best parse. Both H & S constraints reflect the linguistic realities of the language and together can be thought as the grammar of a language. Figure 3 schematically shows the overall design of the proposed parser and places these con-straints in that context.

Figure 3: Schematic representation of CBHP

3.1 Hard Constraints

The core language knowledge being currently considered that cannot be broken without the sentence being called ungrammatical is named H-constraints. There can be multiple parses which can satisfy these H-constraints. This indi-cates the ambiguity in the sentence if only the limited knowledge base is considered. Stated another way, H-constraints are insufficient to restrict multiple analysis of a given sentence and that more knowledge (semantics, other prefe-rences, etc.) is required to curtail the ambiguities. Moreover, we know that many sentences are syn-tactically ambiguous unless one uses some pragmatic knowledge, etc. For all such construc-

4 For details on the corpus type, annotation scheme, tagset, etc. see Begum et al. (2008a).

tions there are multiple parses. As described ear-lier, H-constraints are used during intra-clausal (1st stage) and inter-clausal (2nd stage) analysis (cf. Figure 3). They are used to form a constraint graph which is converted into integer program-ming equalities (or inequalities). These are then solved to get the final solution graph(s) (Bharati et al., 2008a). Some of the H-constraints are: (1) Structural constraints (ensuring the solution graph to be a tree, removing implausible lan-guage specific ungrammatical structures, etc.), (2) Lexicon (linguistic demands of various heads), and (3) Other lexical constraints (some language specific characteristics), etc.

3.2 Soft Constraints

The S-constraints on the other hand are the con-straints that can be broken, and are used in the language as preferences. These are used during the prioritization stage. Unlike the H-constraints that are derived from a knowledge base and are used to form a constraint graph, S-constraints have weights assigned to them. These weights are automatically learnt using a manually anno-tated dependency treebank. The weights are used to score the parse trees. The tree with the maxi-mum overall score is the best parse. Some such S-constraints are, (1) Order of the arguments, (2) Relative position of arguments w.r.t. the verb, (3) Agreement, (4) Structural preferences/General graph properties (mild non-projectivity, valency, dominance, etc.), etc.

Some of the graphical S-constraints that we have used are, 1) {child POS, parent POS}, 2) {child POS, GGP POS}, 3) {child POS, child Width}, 4) {child POS, parent Depth}, 5) {parent POS, GP POS}, where POS is the Part-of-Speech tag and GP is the grandparent and GGP is the great grandparent of the child and parent Depth and child Width are the depths and widths of the parent and child sub-tree respectively.

Parses obtained after the 2nd stage, satisfies all the relevant H-constraints. We score these parses based on the S-constraints and the parse with the max score is selected. The score (ζ ) of a parse p is calculated as follows:

(1)

where, is a recursive scoring function, Rp is the root node of the parse p

(2)

( ) ( )pRp ζζ =

ζ

( ) ( ) ( )]*[ nee

Cken ζζζ ∑ +=

46

)( iP γ

where, Cne is the child of node n along edge e and k is a parameter

(3) where, is the probability of the relation on edge e being r given that is the ith S-constraint and is the probability of oc-currence of and 1

ik is a weight associated with .

(4)

where, ),( rC iγ is the count of occurrence of relation r and together and )(rC is the count of occurrence of the relation r in training data. These counts will be calculated from the training data for each of the S-constraint and stored.

The ranking function tries to select a parse p for a sentence such that the overall accuracy of the parser is maximized. The parameters k and ki

1 in (2) and (3) above are set using maximum like-lihood estimation. Note that the scoring function considers structure of the parse along with the linguistic constraints under which this structure can occur.


Initially the two stage constraint based parser is used to parse the sentences. It outputs multiple parses for each of the sentence. These parses are ranked using the scoring technique discussed in section 3.2. Three variations of the scoring tech-nique are explained below. The probabilities re-quired are calculated from the training data.

4.1 Method1

In this method the parses are scored using a sin-gle S-constraint using the scoring function dis-cussed in section 3.2 with i=1. The parse with the best score is the required output. In this case there may be multiple parses with the same high-est score using a single S-constraint. Output is the 1st parse out of all parses having the highest score. To solve the problem of multiple parses with the highest score, second method is used. For this method we tried with different values of k in equation 2 like 0.1, 0.5, 1, 2, 5 etc. The best results are for k=2. So for all the other methods experiments are done by fixing the k value as 2.

4.2 Method2

In the second method, initially only one S-constraint is used to score the parses using the scoring function similar to the method1. If there are multiple parses with highest score then second S-constraint is used to resolve them and so on till there is unique parse or till all the S-constraints are used. If there are still multiple parses the first-one is the output. In this method, the order in which we use the S-constraints is important. This affects the accuracy of the pars-er. The order which we have used is the descend-ing order of the best accurate (found using the hindi development data with method1) S-constraint.

4.3 Method3

In the third method all the soft-constraints are used in parallel with weights 1

ik associated with each of them. Then the parses are scored using the scoring technique discussed in section 3.2. The algorithm for boosting loss function as dis-cussed in Collins (2000) is used to learn the weights for each of the S-constraints.

This algorithm runs on the training data and learns the weights for each S-constraint. This has several parameters: 1) the margins for each parse of each sentence, 2) Number of iterations (N) to find the best possible weights.

Margins are initialized with the best single S-constraint score obtained from the method1. The algorithm is run with different values of N. After N=2000 weights of the features did not change considerably. So the value of N is fixed as 2000.

4.4 Parameters

The parameters that are finalized after experi-menting with all the possible values and used in all of the above methods are shown below in ta-ble 1.

Parameter value

Value of k 2 Best S-constraint (Method1) {C-POS,P-POS} Value of N 2000 Margins initialization {C-POS,P-POS}

Table 1. Parameters, where C-POS and P-POS are the child and parent POS respectively

4.5 Results

We have tried all the three methods on the de-velopment data and method 2 gave the best re-sults on the development data. So we gave the output of method 2 on the testing data for the

( ) ])}()/({*[ 1∑ +=i

iii PrPke γγζ

)/( irP γ iγiγ iγ

iγ

iγ

( ) )(/),(/ rCrCrP ii γγ =

iγ

iγ

47

evaluation for the tools contest. The second me-thod is used starting with the best S-constraint {C-POS, P-POS}. Our results on the hindi test data are shown below in table 2.

UA LA L Method2 85.55 62.20 65.88

Table 2 Results on Hindi Test data

Acknowledgements We would like to thank Mr. Samar Husain for his valuable guidance and suggestions. We would also like to thank Dr. Dipti Misra Sharma and Dr. Rajeev Sangal for their valuable suggestions and support.

References Abney, S. Partial Parsing via Finite-State Cascades.

Natural Language Engineering, 2(4):337–344. 1997.

Attardi, G. and F. Dell’Orletta. Chunking and Depen-dency Parsing. LREC Workshop on Partial Pars-ing: Between Chunking and Deep Parsing. Marra-kech, Morocco. 2008.

Begum, R., S. Husain, A. Dhwaj, D. Sharma, L. Bai, and R. Sangal. 2008a. Dependency annotation scheme for Indian languages. Proc. of IJCNLP08.

Begum, R., S. Husain, D. Sharma and L. Bai. 2008b. Developing Verb Frames in Hindi. Proc. of LREC08.

Bharati, A., S. Husain, D. M. Sharma, and R. Sangal. 2008a. A Two-Stage Constraint Based Dependen-cy Parser for Free Word Order Languages. In Pro-ceedings of the COLIPS IALP.

Bharati, A., S. Husain, B. Ambati, S. Jain, D. Sharma and R. Sangal. 2008b. Two Semantic features make all the difference in parsing accuracy. Proc. of ICON-08.

Bharati, A., D. M. Sharma, L. Bai and R. Sangal. AnnCorra: Annotating Corpora Guidelines for POS and Chunk Annotation for Indian Languages. LTRC-TR31. 2006.

Bharati, A., R. Sangal, and T. P. Reddy. 2002. A Con-straint Based Parser Using Integer Programming, In Proc. of ICON.

Bharati, A., V. Chaitanya and R. Sangal. 1995. Natu-ral Language Processing: A Paninian Perspective, Prentice-Hall of India, New Delhi.

Bharati, A. and R. Sangal. 1993. Parsing Free Word Order Languages in the Paninian Framework. Proc. of ACL: 93.

Chen, W., Y. Zhang and H. Isahara. A Two-Stage Parser for Multilingual Dependency Parsing. In Proc. of the CoNLL Shared Task Session of EMNLP-CoNLL. 2007.

Debusmann, R., D. Duchier and G. Kruijff. 2004. Extensible dependency grammar: A new metho-dology. Proceedings of the Workshop on Recent Advances in Dependency Grammar.

Hall, J., J. Nilsson, J. Nivre, G. Eryigit, B. Megyesi, M. Nilsson and M. Saers. 2007. Single Malt or Blended? A Study in Multilingual Parser Optimi-zation. Proc. of EMNLP-CoNLL shared task 2007.

Husain, S., P. Gadde, B. Ambati, D. Sharma and R. Sangal. 2009. A modular cascaded approach to complete parsing. In Proc. of COLIPS IALP-09. Singapore.

Karlsson, F., A. Voutilainen, J. Heikkilä and A. Anttila, (eds). 1995. Constraint Grammar: A lan-guage-independent system for parsing unrestricted text. Mouton de Gruyter.

Maruyama, H. 1990. Structural disambiguation with constraint propagation. In Proc. of ACL:90.

McDonald, R., F. Pereira, K. Ribarov, and J. Hajic. 2005. Non-projective dependency parsing using spanning tree algorithms. Proc. of HLT/EMNLP.

Mel'Cuk, I. A. 1988. Dependency Syntax: Theory and Practice, State University Press of New York.

Nivre, J., J. Hall, J. Nilsson, A. Chanev, G. Eryigit, S. Kübler, S. Marinov and E Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. NLE.

Shieber, S. M. 1985. Evidence against the context-freeness of natural language. In Linguistics and Philosophy, p. 8, 334–343.

Shiuan, P.Li. and C. Ting Hian Ann. A Divide-and-Conquer Strategy for Parsing. In Proc. of IWPT. 1996.

Schröder, I. 2002. Natural Language Parsing with Graded Constraints. PhD thesis, Hamburg Univ.

Tapanainen, P. and T. Järvinen. 1997. A non-projective dependency parser. Proc. of the 5th Conference on Applied Natural Language Processing, pp. 64–71.

48

Bidirectional Dependency Parser for Hindi, Telugu and Bangla

Prashanth MannemLanguage Technologies Research Center

International Institute of Information TechnologyHyderabad, AP 500032, India

[email protected]

Abstract

This paper describes the dependencyparser we used in the NLP Tools Con-test, 2009 for parsing Hindi, Bangla andTelugu. The parser uses a bidirectionalparsing algorithm with two operations projand non-proj to build the dependency tree.The parser obtained Labeled AttachmentScore of 71.63%, 59.86% and 67.74%for Hindi, Telugu and Bangla respectivelyon the treebank with fine-grained depen-dency labels. With coarse-grained labelsthe dependency parser achieved 76.90%,70.34% and 65.01% respectively.

1 Introduction

Shen and Joshi (2008) proposed an incrementalbidirectional dependency parsing algorithm whichdoes a greedy search over the sentence by pick-ing the relation between two words with the bestscore each time. It builds the tree incrementallyby not following a strict direction (left-to-right orright-to-left).

In this work, we apply this bidirectional parsingapproach by using two actions to build the depen-dency tree for a sentence. The two actions are usedto distinguish nodes that have an outgoing non-projective arc and those that don’t. Our systemachieved Labeled Attachment Score of 71.63%,59.86% and 67.74% for Hindi, Telugu and Banglarespectively with fine-grained dependency labels.

2 Our approach

Non-projectivity is a common phenomenon inIndian languages due to their relatively freeword order. A relation is non-projective if theyield/projection of the parent in a relation is not

proj

proj

proj

NP1

NP2 NP3

VG2

VG1

NP1 VG1 NP2 NP3 VG2

non−proj

Figure 1: Example of dependency tree built usingthe two actions proj and non-proj

contiguous. 8% of Hindi sentences in the re-leased treebank are non-projective. Some of thesentences have multiple non-projective arcs withina sentence. For Bangla, the corresponding num-ber is 6.28%. Telugu treebank has very few non-projective structures (0.9%) due to the kind of cor-pus chosen for treebank annotation. The averagesentence lengths in the Hindi, Bangla and Tel-ugu treebanks are 9.1 chunks, 6.4 chunks and 3.8chunks respectively. The smaller sentence lengthin Telugu explains the low occurrence of non-projectivity in Telugu treebank.

Non-projectivity has been handled in incremen-tal parsers earlier. Nivre and Nilsson (2005) useda pseudo-projective parsing technique to recovernon-projective dependencies in a post processingstep in a transition based parser. Nivre (2009) pro-posed a transition system which can parse arbi-trary non-projective trees by swapping the orderof words in the input while parsing.

In our approach to dependency parsing, we use

49

two different operations to connect nodes that haveoutgoing non-projective arcs and those that don’t.A dependency tree is built by connecting nodeswith two operations proj and non-proj. The op-eration non-proj is used to connect the head ina non-projective relation to its parent. proj isused to connect all the other nodes in a depen-dency tree to their parents. In the example de-pendency tree given in Figure 1, there is a non-projective arc from NP1 to VG2. So, the head ofthe non-projective arc (NP1) is connected to itsparent (VG1) with the operation non-proj. Thechild in the non-projective arc (VG2) is connectedusing a proj operation itself. Nodes connectedthrough non-proj indicate scope for an outgoingnon-projective arc from them. During parsing, theset of candidate parents for a node will include allthe nodes which satisfy projectivity and also thosenodes which have been connected through a non-proj operation.

In this work, we apply the bidirectional parsingapproach described in Shen and Joshi (2008) to In-dian language dependency parsing using the twooperations defined above. The original implemen-tation of the parsing algorithm by Libin Shen per-formed unlabeled LTAG dependency parsing bytraining on LTAG-spinal treebank. We extend theparser to do labeled dependency parsing by learn-ing from dependency treebanks.

Mannem et al. (2009) extracted LTAG-spinaltreebank for Hindi from a dependency treebankby using the adjunction operation to handle non-projective structures and attachment to handle pro-jective relations other than coordination structures.They trained the bidirectional LTAG dependencyparser (Shen and Joshi, 2008) on the extractedHindi LTAG-spinal treebank with good results.Our work is inspired from their approach of han-dling non-projectivity by using a separate adjunc-tion operation.

3 Labeled Bidirectional DependencyParser

Shen and Joshi (2008) proposed a bidirectional in-cremental parsing algorithm which searches thesentence for the most confident dependency hy-pothesis in both the directions (left-to-right &right-to-left). The search can start at any positionand can expand the partial results in any direction.The order of search is learned automatically.

The implementation of this parsing framework

by Libin Shen could only do unlabeled LTAG de-pendency parsing and the training data was LTAG-spinal treebank in LTAG-spinal format. We ex-tended the implementation to perform labeled de-pendency parsing on CoNLL format data.

In the rest of the section, we give an overview ofthe parsing algorithm along with the training andinference processes presented in (Shen and Joshi,2008). We mention our extensions to the originalparsing framework whereever appropriate.

3.1 Parsing Algorithm

We are given a linear graph G = (V,E) withvertices V = {vi} and E(vi−1, vi) with a hid-den structure U = {uk}. The hidden structureuk = (vsk

, vek, lk), where vertex vek

depends onvertex vsk

with label lk ε L is what we want tofind (the parse tree). L is the full list of depen-dency labels occurring in the corpus. A sentenceis a linear graph with an edge between the adja-cent words. A fragment is a connected sub-graphof G(V,E). Each fragment x is associated witha set of hypothesized hidden structures, or frag-ment hypotheses for short: Y x = {yx1 , · · · , yx2}.Each yx is a possible fragment hypothesis of x.A fragment hypothesis represents a possible parseanalysis for a fragment. Initially, each word withits POS tag comprises a fragment.

Let xi and xj be two fragments, where xi∩xj =∅ and are directly connected via an edge in E. Letyxi be a fragment hypothesis of xi , and yxj a frag-ment hypothesis of xj . We can combine the hy-potheses for two nearby fragments with one of thelabels from L. Suppose we choose to combine yxi

and yxj with an operation Rtype,label,dir to build afragment hypothesis for xk = xi ∪ xj . The outputof the operation is

yxk = Rtype,label,dir(xi, xj , yxi , yxj ) ⊇ yxi ∪ yxj

where type ε {proj, non-proj} is the type of oper-ation, label is the dependency label from L anddir ε{left, right}, representing whether the leftor the right fragment is the parent. yxi and yxj

stand for the fragment hypotheses of the left andright fragments xi and xj .

An operation R on fragment hypotheses R.yxi

and R.yxj generates a new hypotheses y(R) forthe new fragment which contains both the frag-mentsR.xi andR.xj . The score of an operation isdefined as

s(R) = W.φ(R)

50

where s(R) is the score of the operation R, whichis calculated as the dot product of a weight vectorW and φ(R), the feature vector of R. The scoreof the new hypothesis is the sum of scores of theoperation and the involving fragment hypotheses.

score(y(R)) = s(R)+score(R.yxi)+score(R.yxj )

The feature vector φ(R) is defined onR.yxi andR.yxj , as well as the context hypotheses. If φ(R)only contains information in R.yxi and R.yxj , itscalled level-0 feature dependency. If featurescontain information of the hypotheses of nearbyfragments, its called level-1 feature dependency.A chain, is used to represent a set of fragments,such that hypotheses of each fragment always havefeature dependency relations with some other frag-ments within the same chain. Furthermore, eachfragment can only belong to one chain. A set ofrelated fragment hypotheses is called a chain hy-pothesis. For a given chain, each fragment con-tributes a fragment hypothesis to build a chain hy-pothesis. Beam search is used with a predefinedbeam width to keep the top k chain hypotheses foreach chain. The score of a chain hypothesis is thesum of the scores of the fragment hypotheses inthis chain hypothesis. For chain hypothesis c,

score(c) =∑

fragment hypothesis yx of c

score(yx)

A cut T of a given sentence, T = {c1, c2,· · ·,cm}, is a set of chains satisfying

• exclusiveness: ∪ci ∩ ∪cj = ∅,∀i, j, and

• completeness:∪(∩T ) = V .

Furthermore, HT = {Hci |ci ε T} is used to rep-resent the sets of fragment hypotheses for all thefragments in cut T . At every point in the parsingprocess, a priority queue of candidate operationsQ is maintained. Q contains all the possible op-erations for the fragments and their hypotheses incut T . s(R) is used to order the operations in Q.

With the above formal notations, we now list theinference and learning algorithms in Algorithm 1and Algorithm 2.

3.2 DecodingAlgorithm 1 describes the procedure of buildinghypotheses incrementally on a given linear graphG = (V,E). Parameter k is used to set the beamwidth of search. Weight vector w is used to com-pute the score of an operation. First, the cut T is

Algorithm 1 LBDP: Inference AlgorithmINPUT: graph G(V,E) and weight vector W;INITIATE cut T , hypothesesHT , queue of can-didate operations Q;while Q is not empty do

operation y← argop εQ max score (op, W);UPDATE T , HT , Q with y ;

end while

initiated the cut T by treating each vertex in V asa fragment and a chain. Then the initial hypothe-ses for each vertex/fragment/chain are set with thePOS tag for each word. The priority queue Q isused to collect all the possible operations over theinitial cut T and hypotheses HT . Whenever Q isnot empty, the chain hypothesis with highest scoreon operation according to a given weight vector wis searched for and the cut along with its hypothe-ses are updated with that chain hypothesis. Thecandidate queue Q is then updated by removingoperations depending on the chain hypotheses thathave been removed fromHT , and adding new op-erations depending on those chain hypotheses.

Algorithm 2 LBDP: Training AlgorithmW← 0;for round = 1..T, i = 1..n do

LOAD graphGr(V,E), hidden structureHr;INITIATE cut T , hypotheses HT , queue Q;while Q is not empty do

operation y← argop εQ max score (op, W);if compatible(Hr,y) then

UPDATE T , HT , Q with y ;elsey∗ ← searchCompatible(Q,y);promote(y∗)demote(y)UPDATE Q with W;

end ifend while

end for

3.3 Training

For each given training sample (Gr, Hr), whereHr is the gold standard hidden structure of graphGr, cut T , its hypothesesHT and candidate queueQ are initialized. The gold standard Hr is used toguide the search. The candidate (x, y) with thehighest operation score in Q is selected. If y′ iscompatible with Hr, the cut T , hypotheses HT

51

and Q are updated. If y′ is incompatible with Hr

, y′ is treated as a negative sample, and a positivesample y∗ compatible with Hr is searched for inQ. If such compatible hypothesis doesn’t exist, thehypothesis with highest score inQ and compatiblewith Hr is searched. Then, the weight vector w isupdated with with y and y∗. At the end, the can-didate queue Q is updated with the new weightsw to compute the score of operation. Perceptronlearning with margin is used in the training andvoted Perceptron for inference. Algorithm 2 liststhe training procedure.

(a) Unigram features involving pa and ch

pa.wrd ch.wrd

pa.pos ch.pos

pa.cnk ch.cnk

pa.afx ch.afx

(pa.wrd + pa.pos) (ch.wrd + ch.pos)

(pa.wrd + pa.cnk) (ch.wrd + ch.cnk)

(pa.wrd + pa.afx) (ch.wrd + ch.afx)

(b) Bigram features involving pa and ch

(pa.wrd + ch.wrd)

(pa.wrd + pa.pos + ch.wrd + ch.pos)

(pa.wrd + pa.afx + ch.wrd + ch.afx)

(pa.pos + pa.afx + ch.pos + ch.afx)

(pa.cnk + pa.afx + ch.cnk + ch.afx)

(pa.wrd+pa.afx+pa.pos+ch.wrd+ch.afx+ch.pos)

(pa.wrd+pa.afx+pa.cnk+ch.wrd+ch.afx+ch.cnk)

(c) Features involving all the siblings sb of ch

(pa.pos + ch.pos + sb.pos)

(pa.wrd + pa.pos + ch.pos + sb.pos)

(pa.wrd + pa.afx + ch.afx + sb.pos)

(pa.pos + pa.afx + ch.pos + ch.afx + sb.pos)

(pa.pos + ch.pos + sb.dlab)

(pa.wrd + pa.pos + ch.pos + sb.dlab)

(pa.wrd + pa.afx + ch.afx + sb.dlab)

(pa.pos + ch.pos + sb.pos + sb.dlab)

(pa.wrd + pa.pos + ch.pos + sb.pos + sb.dlab)

(d) Features involving the context words b.i

i could range from -2 to +2(pa.pos + ch.pos + b.i.pos)

(pa.pos + ch.pos + b.i.cnk)

(pa.pos + ch.pos + b.i.afx)

(pa.wrd + pa.pos + ch.pos + b.i.pos)

(pa.wrd + pa.pos + ch.pos + b.i.cnk)

(pa.wrd + pa.pos + ch.pos + b.i.afx)

Table 1: Features used by the parser for all thethree languages

4 Features

We take the head words of chunks in the SSF treeas input to the system. So, each word in the inputsentence has a chunk label too apart from POS tagand affix information. The vibhakti (post-position)included in the input SSF feature vector is taken asthe affix for our system.

The parser was tried out with different featuresets and tuned by doing a 5-fold cross validationon the entire data for Hindi. The feature set thatperformed best for Hindi was used for the othertwo languages. The full list of features is listed inTable 1. .cnk denotes the word’s chunk. .dlab de-notes the word’s predicted dependency label. .wrddenotes the word form. .afx denotes the affix in-formation.

Table 1a and 1b lists the features defined on theparent (pa) and child (ch) in a relation. 1c is thelist of features involving the siblings of the ch (theother children of pa). 1d is the list of features forthe context words. These context words could befrom the input sentence or from the partial treessurrounding the current hypothesis (level-1, level-2 feature dependencies in Section 3.1). We callthese two contexts sentence context (SC) and localcontext (LC).

5 Data

Annotated training and development data forHindi, Telugu and Bangla were released as part ofthe contest. There were some errors in the data be-cause of which few sentences had to be discarded.The final data (training+development) contained1651 Hindi, 1606 Telugu and 1129 Bangla sen-tences. 71 Bangla (6.28%), 233 (8%) Hindi and16 (0.9%) Telugu sentences were non-projectivein the entire corpus.

For the testing phase of the contest, the parserwas trained on the entire released data with thebest performing feature set and the unannotatedtest data was parsed with the model obtained.


The Labeled Bidirectional Dependency Parser(LBDP) with the features listed in Table 1 wasused for developing models for the three lan-guages. This feature set was arrived at by do-ing a 5-fold cross validation on the released data(training+development) for Hindi. Sentence con-text (SC) and local context (LC) were varied over

52

a range of 0 to +/-2. The best accuracy over the5-fold cross validation was reported with a SC of2 and LC of 1. This feature set was used for all theaccuracies reported in this section.

The annotated data for all the three languageswas released with fine-grained and coarse-graineddependency labels separately by the organizers.The unannotated test data released by the organiz-ers had 150 sentences for each of the three lan-guages. The accuracies achieved by our systemon the data with fine-grained dependency labelsare in Table 2. UAS is the Unlabeled AttachmentScore, LAS is the Labeled Attachment Score andLA is the Labeled Accuracy. UAS, LAS and LAare standard evaluation metrics in the literature fordependency parsing (Nivre et al., 2007).

On fine-grained tagsetLanguage UAS LAS LA

Hindi 88.24% 71.63% 73.70%Bangla 85.33% 67.74% 69.93%Telugu 86.11% 59.86% 60.72%Average 86.56% 66.41% 68.11%

Table 2: Accuracies on data annotated with fine-grained dependency labels

Though the available annotated data for allthe three languages is small, the average UAS(86.56%) is high. This is because of the use ofgold-standard chunks during training and testing.The task involved identifying the inter-chunk de-pendencies only given the correct chunks. Fullfledged dependency parsing involving dependen-cies between all the words in a sentence wouldresult in a lower score across all the languages.The average LAS is however considerably lowat 66.41% when compared to the avg. UAS(86.56%). This is because of the small size ofthe available treebanks and also due to the largenumber of dependency labels used in the anno-tation (>30). The dependency labels used in thetreebanks are syntactico-semantic in nature, mak-ing labeling even more difficult than labeling withpure syntactic labels .

The organizers also released treebanks for thethree languages with coarse-grained dependencylabels. Table 3 shows the performance of theparser on this data. The average UAS has droppedto 85.76% using the coarse-grained dependencylabels. This shows that the fine-grained depen-dency labels help in unlabeled dependency parsing

even though their accuracy is low. The avg. LAShas increased to 70.75% because of the reductionin the number of dependency labels.

On coarse-grained tagsetLanguage UAS LAS LA

Hindi 88.06% 76.90% 79.24%Bangla 83.56% 70.34% 73.05%Telugu 85.76% 65.01% 66.21%Average 85.79% 70.75% 72.83%

Table 3: Accuracies on data annotated withcoarse-grained dependency labels

For the contest, we experimented with both apurely projective parser using only the proj oper-ation and a non-projective system with both projand non-proj operations. In the strictly projec-tive bidirectional incremental parser, when a non-projective relation is encountered during training,the parser skips the rest of the sentence and jumpsto the next sentence. The results of the parserstrained on the training data and tested on the de-velopment data are in Table 4.

LanguageFine-grained Coarse-grained

Proj. Non-proj. Proj. Non-proj.Hindi 56.97% 57.82% 61.85% 62.56%

Bangla 52.83% 53.15% 57.40% 58.49%Telugu 40.15% 39.89% 45.48% 43.83%

Table 4: LAS of the projective and non-projectiveparsers on development data with fine-grained andcoarse-grained dependency labels

Table 4 clearly shows that the non-projectiveparser works better for Hindi & Bangla and theprojective parser works better for Telugu. Thisis because of the low occurrence (< 1%) of non-projective arcs in Telugu.

7 Conclusion and Future Work

In this work, we proposed an approach to depen-dency parsing which builds the tree using two ac-tions proj and non-proj. non-proj is used to con-nect nodes that lead to non-projective arcs. A nodeconnected with this operation is available/visiblefor combination beyond its projective scope. projis used to connect the rest of the nodes. A bidirec-tional dependency parsing algorithm is used withthese two operations for our system. The parserwas trained on the entire released data and wegot Labeled Attachment Score of 71.63%, 67.74$

53

and 59.86% for Hindi, Bangla and Telugu respec-tively for the fine-grained dependency tagset. Withcoarse-grained dependency tagset, the LAS for thethree languages are 76.90%, 70.34% and 65.01%respectively.

The number of actions for labeled dependencyparsing balloons up to 2*30 (2 actions and approx.30 dependency labels) operations. So, 60 hypothe-ses get created for every dependency during pars-ing. This makes the training and parsing ineffi-cient. To decrease the inference time, we restrictedthe hypotheses to include only those dependencylabels that were seen in the training data for thePOS tag of the child in the hypothesis. Since thetraining data is not large for these three languages,this restriction could have missed certain depen-dency labels during testing. We could try out otherpruning mechanisms and see if the accuracy aswell as the efficiency improves.

ReferencesPrashanth Mannem, Aswarth Abhilash, and Akshar

Bharati. 2009. Ltag-spinal treebank and parser forhindi. In Proceedings of the International Confer-ence on NLP (ICON), Hyderabad, India.

J. Nivre and J. Nilsson. 2005. Pseudo-projective de-pendency parsing. In Proceedings of the 43rd An-nual Meeting of the Association for ComputationalLinguistics (ACL), pages 99–106.

Joakim Nivre, Johan Hall, Sandra Kubler, Ryan Mc-donald, Jens Nilsson, Sebastian Riedel, and DenizYuret. 2007. The CoNLL 2007 shared task on de-pendency parsing. In Proceedings of the CoNLLShared Task Session of EMNLP-CoNLL 2007, pages915–932, Prague, Czech Republic. Association forComputational Linguistics.

Joakim Nivre. 2009. Non-projective dependency pars-ing in expected linear time. In Proceedings of theJoint Conference of the 47th Annual Meeting of theACL and the 4th International Joint Conference onNatural Language Processing of the AFNLP, pages351–359, Suntec, Singapore.

Libin Shen and Aravind Joshi. 2008. LTAG depen-dency parsing with bidirectional incremental con-struction. In Proceedings of the 2008 Conference onEmpirical Methods in Natural Language Process-ing, pages 495–504, Honolulu, Hawaii, October. As-sociation for Computational Linguistics.

54

icon09 nlp tools contest: indian language...

Documents