the tiger treebank - uni-saarland.de2002:tt.pdf · this paper reports on the tiger treebank, ......
TRANSCRIPT
TheTIGERTreebank
SabineBrants�, StefanieDipper
�, Silvia Hansen
�, WolfgangLezius
�, GeorgeSmith
�
�ComputationalLinguistics,SaarlandUniversity, Postfach15 11 50 Saarbrucken,Germany�
sabine,hansen� @coli.uni-sb.de�Instituteof NaturalLanguageProcessing(IMS), StuttgartUniversity, Germany�
dipper, lezius� @ims.uni-stuttgart.de�Institut fur Germanistik,PotsdamUniversity, Germany
Abstract
This paperreportson the TIGER Treebank,a corpusof currently 35.000syntacticallyan-notatedGermannewspapersentences.We describewhat kind of information is encodedin thetreebankandintroducethe differentrepresentationformatsthat areusedfor the annotationandexploitation of the treebank.We explain the differentmethodsusedfor the annotation:interac-tive annotation,usingthe tool Annotate, andLFG parsing. Furthermore,we give an accountofthe annotationschemeusedfor the TIGER treebank.This schemeis an extendedandimprovedversionof theNEGRA annotationschemeandwe illustratein detailthelinguistic extensionsthatweremadeconcerningtheannotationin theTIGER project.Themaindifferencesareconcernedwith coordination,verb-subcategorization,expletivesaswell aspropernouns. In addition, thepaperalsopresentsthequerytool TIGERSearchthatwasdevelopedin theprojectto exploit thetreebankin anadequateway. We describethequerylanguagewhich wasdesignedto facilitateasimpleformulationof complex queries;furthermore,we shortly introduceTIGERin, a graphicaluserinterfacefor queryinput. Thepaperconcludeswith asummaryandsomedirectionsfor futurework.
1 Introduction
Corpus-basedmethodsplay an importantrole in empiricallinguisticsaswell asin machinelearningmethodsin NLP. In thesetwo areasof research,largenaturallanguagecorpora,enrichedwith syntacticinformation,areneeded.Thus,in recentyears,therehasbeenanincreasinginterestin theconstructionof thesesyntacticallyannotatedcorpora,commonlycalled‘treebanks’(Lezius,2001).
For German,thefirst initiative in thefield of treebankswastheNEGRACorpus((Skut,Brants,Krenn,& Uszkoreit, 1998)and(Brants,Skut,& Uszkoreit, 1999)),which containssyntacticallyinterpretednewspapertexts. Furthermore,thereis theVerbmobilCorpus(Wahlster, 2000),whichcoverstheareaof spokenlanguage.
This paperreportson theTIGER Treebankproject,which aimsat building the largestandmostex-haustively annotatedtreebankfor German.Theannotationformatandschemearebasedon theNE-GRA corpus;however, theTIGER TreebankexceedstheNEGRA corpusin sizeaswell asin detailof annotation.SincetheNEGRACorpusis ratherrestrictedin its size(20,000syntacticallyannotatedsentences)and the VerbmobilCorpusin its domains(i.e. spontaneousspeechfor the appointmentnegotiationdomain),the constructionof the TIGER Treebankasa comprehensive resourcefor theGermanlanguagewasanecessarystepto overcomethesedrawbacks.
This paperis structuredin thefollowing way: Section2 describestheannotationformatandprovidesgeneralinformationon theannotationscheme.Furthermore,it containsa shortoverview of treebank
1
initiatives for languagesother thanGerman. In Section3, the differentmethodsusedfor the anno-tationof the treebankarepresented.The linguistic extensionsthatweremadein theTIGER projectconcerningtheannotationschemearecoveredin Section4. Section5 givesanoverview of thequerylanguageandquerytool thatweredevelopedin theprojectfor theexploitationof thetreebank.Finally,Section6 summarizesthepaperandsketchessomeideasfor futurework.
2 The TIGER Corpus
Thebasisof theTIGERTreebankaretexts from theGermannewspaperFrankfurterRundschau. Onlycompletearticleswereused,which were taken from all kind of domains1 so as to cover a broaderrangeof languagevariations. At the currentstage,the corpuscontains35.000syntacticallyanno-tatedsentences.In thesecondprojectphase,this amountis to beextendedto approximately45.000sentences.
2.1 Levels of annotation
In theNEGRA aswell asin theTIGER corpus,a hybrid framework is usedwhich combinesadvan-tagesof dependency grammarandphrasestructuregrammar. Thesyntacticstructureis representedbya tree. Thebranchesof a treemaycross,allowing theencodingof local andnon-localdependenciesandeliminatingtheneedfor traces.This approachhasconsiderableadvantagesfor free-word orderlanguagessuchas German,which show a large variety of discontinuousconstituency types(Skut,Krenn,Brants,& Uszkoreit,1997).
The linguistic annotationof eachsentencein the TIGER Treebankis representedon a numberofdifferentlevels (seefigure1): Part-of-speechinformationis encodedin terminalnodes(on thewordlevel). Non-terminalnodesarelabeledwith phrasecategories.Theedgesof a treerepresentsyntacticfunctions. Furthermore,a supplementaryannotationon the word level facilitatesthe encodingofinformationon lemmataandmorphology2. For part-of-speechtagging,theStuttgart-Tubingen-Tagset(Schiller, Teufel, & Stockert, 1999) is usedin a slightly modifiedversion((Kramp & Preis,2000)and(Smith& Eisenberg, 2000)). Informationon lemmataandmorphologywasnot annotatedin theNEGRA corpus;this is anew featurethatwasaddedto theannotationin theTIGERproject.
Syntacticstructuresareratherflat andsimplein orderto reducethepotentialfor attachmentambigui-ties.Thedistinctionbetweenargumentsandadjuncts,for instance,is notexpressedin theconstituentstructure,but is insteadencodedby meansof syntacticfunctions.
Apart from the annotationof morphologyand lemmata,anotherannotationlevel wasaddedto theTIGER corpus:Secondaryedges,i.e. labelleddirectedarcsbetweenarbitrarynodes,areusedto en-codecoordinationinformation.Currently, thesesecondaryedgesareonly employedfor theannotationof coordinatedsentencesandverbphrases;anotherpotentialusemight be thesystematicannotationof attachmentambiguities.
1We only excludedregional news and sportsnews becauseexperiencesfrom the pastshowed that thesetexts oftencontainedtables,enumerations,etc. insteadof completesentences.
2Theexamplein Figure1, however, containsno lemmataannotation,but a literal translationinstead.
2
0 1 2 3 4 5 6 7 8 9
500 501
502
503
Neben
APPR�
Dat
besides
denART�
Def.*.Dat.Pl
the�
Mitteln
NN
Neut.Dat.Pl.*
means
desART�
Def.Neut.Gen.Sg
of_the�
Theaters
NN
Neut.Gen.Sg.*
theater�
benutzteVVFIN�
3.Sg.Past.Ind�
used
Moran
NE
*.Nom.Sg
Moran
dieART�
Def.Fem.Akk.Sg
the�
Toncollage
NN
Fem.Akk.Sg.*
sound_collage�
.
$.�−−
NK NK NK NK
AC NK NK
NP
GR
PP
MO HD SB
NP
OA
S
Figure1: Differentlevelsof annotation
2.2 Annotation formats
In the TIGER project, we useseveral annotationformatsfor corpusstorage,export andquerying.Thereexist scriptsthatenablethetransformationfrom oneformatto another.
First of all, the annotatedsentencesarestoredand maintainedin a MySQL database;informationabouttheannotationis containedin tables.An additionaloutputformat is usedfor theexport of thesentences.Thedatabaseentries(words,morphologicaltags,terminalnodes,non-terminalnodesandedges)canbe exportedto a tablestoredin a line-orientedandASCII-basedformat (Brants,1997).The major advantageof this export format is that it is easily readablefor humansaswell aseasilyprocessablefor machines.Sentenceboundariesare identified throughsentencestartandend tags.Furthermore,informationon sentenceorigins,editorsandusedtagsis storedat thebeginningof eachexport file.
Basedon thisexport format,theTIGERcorpuscanbetransferredinto a third format,namelyTIGERXML (Lezius, Biesinger, & Gerstenberger, 2002a). A TIGER XML file is typically split up intoheaderandbody. Thecorpusheadercontainsmeta-informationon thecorpus(suchascorpusname,date,author, etc.) anda declarationof the tagsthat areusedfor morphology, part-of-speech,non-terminal nodesand edges. In the corpusbody, directedacyclic graphsareusedas the underlyingdatamodelto encodethe linguistic annotation.Words,part-of-speechtags,morphologicaltagsandlemmataoccurasattributesof the element‘terminal’, whereasnon-terminalsarerepresentedin anadditionalelementcalled‘nonterminal’ referingto the correspondingterminalID. Secondaryedgesareencodedexplicitly aswell. By usingan XML format, the TIGER Treebankis exchangableandusablewith a large rangeof tools. TheXML format is alsothebasisfor theuseof thecorpusquerytool TIGERSearch(Lezius& Konig,2000).
2.3 Comparable treebank initiatives
Oneof the first andbestknown treebanksis the PennTreebankfor the English language(Marcuset al., 1994),which consistsof about1 million wordsof newspapertext. It containspart-of-speechtaggingandroughsyntacticandsemanticannotation.A bracketingformatis usedto encodepredicate-argumentstructureandtrace-fillermechanismsareusedto representdiscontinuousphenomena.Othercomparabletreebanksfor Englishare,for instance,theSusanneCorpus(Sampson,1995)(containingdetailedpart-of-speechinformation and phrasestructureannotation),the LancasterParsedCorpus(Leech,1992) (representingphrasestructureannotationby meansof labelledbracketing) and the
3
British part of the InternationalCorpusof English (Greenbaum,1996) (about1 million words ofBritish Englishthatweretagged,parsedandafterwardschecked).
For languagesotherthanEnglish,a fairly well-known treebankis thePragueDependency Treebankfor Czech(Hajic, 1999).It containsabout450.000tokensandis annotatedonthreelevels:onthemor-phologicallevel (tags,lemmata,word forms), on the syntacticlevel (usingdependency syntax)andon the tectogrammaticallevel (encodingfunctionssuchasActor, Patient,etc.). Recently, treebankprojectsfor otherlanguageshave cometo life aswell, e.g. for French(Abeille, Clement,& Kinyon,2000), Italian (Bosco,Lombardo,Vassallo,& Lesmo,2000),Spanish(Moreno,Grishman,Lopez,Sanchez,& Sekine,2000),Turkish(Oflazer, Hakkani-Tur, & Tur, 1999),Russian(Boguslavsky, Grig-orieva, Grigoriev, Kreidlin, & Frid, 2000)andBulgarian(Simov et al., 2002). More initiatives forlinguistically interpretedcorporacanbe found in (Uszkoreit, Brants,& Krenn, 1999)and(Abeille,Brants,& Uszkoreit,2000).
3 Annotation methods
Weusetwo differentmethodsfor thesyntacticannotationof theTIGERcorpus:InteractiveannotationandLFG parsing.Thefirst oneis acombinationof probabilisticparsingandhumanintervention(Sec-tion 3.1). After theparsingis completed,morphologicalannotationis performedsemi-automatically,using the given syntacticannotationfor disambiguation.For the secondmethod,a symbolicLFGgrammaris usedto parselargepartsof thecorpus;theoutputis disambiguatedby a humanannotator(Section3.2).
3.1 Interactive Tagging and Parsing
Interactive annotationis anefficient combinationof automaticparsingandhumanannotation.Insteadof having anautomaticparseraspreprocessoranda humanannotatoraspostprocessor, thetwo stepsareinterwovenin ourapproach.Theparsergeneratesasmallpartof theannotation,which is immedi-atelypresentedvisually to thehumanannotator, who caneitheraccept,corrector rejectit. Basedontheannotator’s decision,theparserproposesthenext partof theannotation,which is againsubmittedto theannotator’s judgement.Thisprocessis repeateduntil theannotationof thesentenceis complete.
The advantageof this interactive methodis that the humandecisionscanbe usedby the automaticparser. Thus,errorsmadeby the automaticparserat lower levels arecorrectedinstantlyanddo not‘shinethrough’onhigherlevels.Thechancesgrow thattheautomaticparserproposescorrectanalyseson higherlevels.
Theinteractive annotationworksonseverallayers.Thelowestoneis thepart-of-speechlayer. Higherlayersaredefinedby the depthof the syntacticstructure. Eachlayer is representedby a differentMarkov Model, hencethe nameCascadedMarkov Models(Brants,1999). The first stepin the an-notationprocessis thegenerationof part-of-speechtags. This stepis performedusingthestatisticaltaggerTnT (Brants,2000b). In additionto thetags,TnT alsogeneratesprobabilitiesthathelp to de-cideon thereliability of a proposedtag. The lower theprobabilityof alternative tags,thehigherthereliability of thebesttagfor aword(Brants& Skut,1998).Approximately84%of all tagassignmentsareclassifiedasreliableby thetagger. Theremaining16%needto beproof-readby humanannotators.
Oncethepart-of-speechtaggingis done,Markov modelsfor higherlayersstartprocessing.Hypothet-ical phrasesaregenerated,andtheonewith thehighestprobability is displayedto theannotator. Thestructurecanbeaccepted,rejectedor manuallycorrectedby theannotator. Interventionby thehuman
4
Figure2: Theannotationtool Annotate
annotatorimmediatelychangesthe setof hypothesesusedby the parser. The syntacticstructureisbuilt phraseby phrase,bottomup. About71%of thephrasessuggestedby theparserarecorrect,17%needminor intervention(i.e., at mostoneconstituentneedsto be addedor deleted).The remaining12%requiremajorinterventionby thehumanannotator.
Bothtaggerandparserareentirelytrainedonpreviously (manually)annotateddata.No manualgram-maror lexicon developmentarenecessary. The annotationschemeis learntautomaticallyby taggerandparser. In caseof changesin the annotationscheme,only a small amountof dataneedsto bechangedmanually. The taggerandparserarethentrainedon thechangeddataandareimmediatelyreadyfor annotationwith thenew scheme.
Theannotationis performedwith thehelpof thetool Annotate(Figure2), a graphicaluserinterfacewith acomprehensive setof treemanipulationfunctionsanddatabaseaccess(Plaehn& Brants,2000).AnnotaterunstheTnT taggerandtheCascadedMarkov Modelsin thebackground.
In orderto achieveahigh level of consistency andto avoid mistakes,weuseavery thoroughapproachto theannotation:First, eachsentenceis annotatedindependentlyby two annotators.With thesup-port of scripts,they afterwardscomparetheir annotationsandcorrectobvious mistakes. Remainingdifferencesaresubmittedto adiscussionbetweentheannotators.Althoughthisprocessis rathertime-consuming,it hasproven to be highly beneficialfor theaccuracy of theannotation(Brants,2000a).Furthermore,it alsosupportsthecontinuousimprovementof theannotationscheme:It is in thediscus-sion betweentheannotatorsthat discrepanciesbetweenthe annotationschemeandthe databecomeobvious. If this happensto be the case,new rulesandbettertestsfor operationalizationareaddedto theannotationscheme.Thus,thereis a cross-fertilizationbetweenthecorpusandtheannotationscheme.
Morphology and Lemmata For theanalysisof lemmataandmorphologicaltags,we useanaddi-tionalgraphicaluserinterface.This is interleavedwith atool calledTigerMorphwhichwasdeveloped
5
Figure3: LFG c- andf-structure
by BertholdCrysmann.TigerMorphdisambiguatestheoutputof amorphologicalanalyseronthebasisof thealreadyexisting syntacticstructure.It proposeslemmataandmorphologicaltagsfor thewordsof a sentence,proceedingfrom left to right. The annotationof morphologyandlemmataresemblestheinteractive annotationof thesyntacticstructuredescribedabove. Ambiguoustagsarepresentedtotheannotator, who thendecideswhich oneis thecorrectalternative. This informationis returnedtoTigerMorphandusedfor thedisambiguationof themorphologicalanalysisfor theremainingwords.
3.2 Annotation by LFG Parsing
As analternative to interactive taggingandparsing,abroadcoveragesymbolicLFG grammar(LexicalFunctionalGrammar(Bresnan,1982))is usedto parsepartsof thecorpus(Dipper, 2000).Usually, theLFG grammaroutputsseveralanalysesfor acorpussentence.Theoutputis first filteredby agrammarinternal rankingmechanismandthendisambiguatedby a humanannotator. A transfercomponentconvertstheselectedanalysisinto theTIGERformat(Zinsmeister, Kuhn,& Dipper, 2002).
Oneadvantageof this approachis theaccuracy of thegrammar’s output. An LFG analysisis alwayssyntacticallyconsistent.It doesnot containinconsistenciessuchas,e.g.,missingsubject-verbagree-ment,in caseof which theparsewould have failed. On theotherhand,thegrammarcertainlyis noterror-free. But thoseerrorswhich do occuraresystematicandhenceeasierto correctthanerrorsthatoccurwith manualannotation.
Parsing The LFG grammarappliedin parsinghasbeendevelopedin the ParGramprojectat theUniversityof Stuttgart,usingtheXerox Linguistic Environment(XLE) (ParGram,2002).Theoutputof an LFG grammarbasicallyconsistsof two representations,the constituentstructure(c-structure)of thesentencebeingparsed,andits functionalstructure(f-structure).C-structureencodesinforma-tion aboutmorphology, constituency, and linear ordering. F-structurerepresentsinformationaboutpredicateargumentstructure,aboutmodification,andabouttense,moodetc. An exampleof c- andf-structureis given in Figure3 for thesentenceEin Mann kommt,der lacht (‘a manis comingwholaughs’).
Disambiguation Almost every sentenceof a newspapercorpusis syntacticallyambiguous.Hencetheoutputof apurelysymbolicgrammarhasto bedisambiguated,i.e.ahumanannotatorhasto select
6
thecorrectanalysis.This taskis supportedby XLE which allows for ‘packing’ thedifferentreadingsinto onecomplex f-structurerepresentation.3
On average,however, a sentenceof the TIGER corpusreceives several thousandsof LFG analy-ses. Clearly it is impossibleto disambiguatethoseanalysesmanually. ThereforeXLE provides a(non-statistical)mechanismfor suppressingcertainambiguitiesautomatically(Frank,King, Kuhn,&Maxwell, 1998).By meansof thismechanism,theaveragenumberof solutionsdropsdown to 17, themedianbeing2.
Conversion into TIGER format All informationthatis requiredby theTIGER annotationschemeis containedin c- andf-structurerepresentationsof LFG. ComparetheLFG representation(Figure3)with theTIGER representation(Figure4) of thesentenceEin Mannkommt,der lacht.
(i) LFG c-structurecontainscategorial information(e.g.,NP, CP), lemmas(Mann), part-of-speechtags(+Noun +Common), andmorphologicaltags(+Sg +Masc +Nom); in Figure3, lemmaandtagsareonly shown for the terminalMann. In theTIGER scheme,this informationis encodedin aslightly differentterminology:thenodesandtagsmentionedabove correspondto TIGER nodesNP,S, thepart-of-speechtagNN, andthemorphologicaltagMasc.Nom.Sg, respectively.
(ii) LFG f-structurerepresentsdependency relationssuchas the headargumentrelationSUBJ andtheheadmodifierrelationADJUNCTrel (‘relative clause’).Notethatin c-structure,NP andCP (therelative clause)do not form a constituent;however, their f-structures,SUBJ andADJUNCTrel, arelinked.This linking informationis encodedby acrossingbranchin TIGER,cf. Figure4.
0�
1 2 3�
4 5�
500�
501�
502�
EinART
a
MannNN
Masc.Nom.Sg
man
kommtVVFIN
comes�
,$,
der�
PRELS
who
lacht�
VVFIN
laughs
SB�
HD
NK NK
S�
RC
NP
SB�
HD
S�
Figure4: TIGER representation
Often,thereis aone-to-onecorrespondencebetweenLFG andTIGER representations.In thesecasesthetransfercomponentsimply convertsoneformat into another, e.g.+Sg +Masc +Nom is mappedto Masc.Nom.Sg, CP to S, SUBJ to SB, etc. However, in othercasesthe transferhasto combineinformationbothfrom c- andf-structure,asin thecaseof theextraposedrelative clausein Figure3/4.Herethe transfercomponentmakesuseof the f-structurelink betweenSUBJ andADJUNCTrel toform a (discontinuous)constituent.Thecategorial label(NP) is derivedfrom c-structure.
3FurthermoreXLE providesvariousbrowsingtoolsapplyingto c-structureaswell asto f-structurewhichcanbeusedformanualdisambiguation(cf. (King, Dipper, Frank,Kuhn,& Maxwell, To Appear)wherethesetoolsaredescribedin detail).
7
Results and Outlook Whenparsedwith thecurrentgrammarversion,50%of thecorpussentencesreceiveatleastoneanalysis;approx.70%of theparsedsentencesreceivethecorrectanalysis(possiblyamongothers).4 About2,000sentencesof theTIGERcorpushasbeenannotatedthis way.
Toenlargecoverage,XLE allowsfor partialparses,providing,e.g.,N orPchunks.In first experiments,N chunkswerefoundwith a precisionof 89%anda recall of 67%; for P chunks,theprecisionwas96%,andtherecallwas79%(Schrader, 2001). Furthermore,to minimizemanualeffort, a statisticaldisambiguationtool canbeintegrated(Riezleret al., 2002).
4 Extensions in the TIGER annotation scheme
The annotationin TIGER is basedon the annotationschemethat wasusedfor the NEGRA corpus(Brantset al., 1999).This annotationschemecovereda broadvarietyof phenomena.However, therewasstill room for improvementin its linguistic adequacy. A vital part of the work in the TIGERprojectis thelinguistic extensionof this annotationscheme.In thefollowing, themajorchangesthatweremadein the TIGER projectarepresented.A moredetailedaccountof thesechangesandanevaluationof theimprovedannotationschemecanbefoundin (Brants& Hansen,2002).
4.1 Coordination
An essentiallinguisticextensionin theTIGERannotationschemewasmadeconcerningtheannotationof coordinatedsentencesandverbphrases.In theNEGRA corpus,argumentsthataresharedby bothverbconjunctsof a coordination,but that areonly mentionedonce,werestructurallylinked only tothenearestpart of the coordination.Thus,theNEGRA annotationis in many casesnot suitablefortheextractionof subcategorizationinformation. In theTIGER treebank,thesesharedargumentsareprovided with secondaryedgesin orderto representtheir syntacticrelationto themoredistantverbconjuncts.
0 1 2 3 4 5 6
500 501
502 503
504
Der
ART
the�
Mann
NN
man
liest�
VVFIN
reads
und�KON
and�zerknüllt�VVFIN
rumples
die�
ART
the�
Zeitung�
NN
newspaper
NK NK NK NK
NP
SB HD HD
NP
OA
S
CJ CD
S
CJ
CS
503 502
0 1 2 3 4 5 6
500 501
502 503
504
Der
ART
the�
Mann
NN
man
liest�
VVFIN
reads
und�KON
and�zerknüllt�VVFIN
rumples
die�
ART
the�
Zeitung�
NN
newspaper
NK NK NK NK
NP
SB HD HD
NP
OA
S
CJ CD
S
CJ
CS
SB
OA
Figure5: Coordinationwith sharedargumentsin NEGRA(left) andTIGER(right)
Figure5 illustratesthedifferencebetweenthe NEGRA andtheTIGER treatmentof thesecases.IntheexamplesentenceDer Mannliestundzerknullt dieZeitung(‘the manreadandrumpledthenews-paper’), the commonsubjectof both verbsis the NP der Mann, the commonobject is the NP dieZeitung. However, in theNEGRA annotationscheme,sharedargumentsarelinked only to thenear-estverb (cf. left part of Figure5). The structureof the treewould be exactly the sameif the first
410% of thesentencesfailedbecauseof gapsin themorphologicalanalyser;6% failedbecauseof storageoverflow ortimeouts(with limits setto 100MB storageand100secondsparsingtime). 10%of theparsedsentenceswerenotevaluatedwrt. thecorrectanalysisbecausethey receivedmorethan20 analyses.
8
verbwereintransitive anddid not have die Zeitungasits object(e.g. Der Mann lacht undzerknulltdie Zeitung(‘the manlaughsandrumplesthe newspaper’)). In contrast,the right part of Figure5shows theannotationof thesentenceaccordingto theTIGER annotationscheme,makinguseof thesecondaryedgesthat wereintroduced.Thus,the TIGER schemeallows the differentiationbetweentransitive andintransitive verbsin coordinations.
4.2 Verb-subcategorization
TheNEGRA corpusprovidesno distinctionsbetweenprepositionalphraseswith respectto their syn-tactic functions;all PPs occuringin sentencesor verbphrasesareunexceptionallymarked with thelabelMO (modifier). In theTIGER project,two additionalfunctionlabelsfor PPs wereintroduced:prepositionalobjects(OP) andcollocationalverb constructions(CVC). The label OP is appliedtoconstructionslike auf jemandenwarten(‘to wait for somebody’).Thesephenomenaaremarked bythefactthattheprepositionauf (‘on’) haslost its lexical meaning.
0 1 2 3 4 5 6 7 8 9 10 11
500 501 502
503 504
505
Verhandlungsführer
NN!
negotiators"beider#PIAT$
of_both%Parteien$
NN!
parties&haben'VAFIN have'
sich(PRF$
themselves)
nach"APPR*
according_to+Medienberichten,
NN!
press reports&auf+
APPR*
on%einen-ART*
a+Gesetzesvorschlag.
NN!
draft law/
geeinigt0VVPP agreed+
.1$.2
NK NK AC NK AC3
NK NK
NK
NP
GR OA
PP
MO
PP
MO HD
NP
SB HD
VP
OC
S
Figure6: Annotationof PPsin NEGRA:MO
0 1 2 3 4 5 6 7 8 9 10 11
500 501 502
503 504
505
Verhandlungsführer
NN!
negotiators
beider#PIAT$
of_both%Parteien
NN!
parties&haben
VAFIN have
sich(PRF$
themselves)
nach
APPR*
according_to+Medienberichten
NN! auf+
APPR* einen-
ART* Gesetzesvorschlag
.NN! geeinigt0
VVPP .
$.2
NK NK AC NK AC3
NK NK
NK
NP
AG OA
PP
MO
PP
OP HD
NP
SB HD
VP
OC
S
press reports& on% a+ draft law/
agreed+
Figure7: Annotationof PPsin TIGER: MO andOP
Figure6 exemplifiesthefactthatNEGRAdid notallow thedistinctionbetweencomplementsandad-junctsonthelevel of edgelabels5. In contrast,theTIGERannotation(Figure7) mirrorsthefunctionaldifferencebetweenthe first PP andthe secondPP in the useof differentedgelabels: the first PPis functionally independentof theverbandservesasanadverbial; it still receivesthe labelMO. ThesecondPP representsa typical examplefor a prepositionalobject(OP) in German:theprepositionauf (‘on’) hascompletelylost its lexical meaningandis purelyfunctional.
5A correctEnglishtranslationof theexamplesentenceis the following: ‘Accordingto pressreports,negotiatorsfrombothpartieshave agreedon a draft law’.
9
Theothernewly introducededgelabel for prepositionalphrases,CVC (collocationalverbconstruc-tion), servesto labelverb+ PP constructionsin which themainsemanticinformationis containedinthe nounof the PP, not in the verb. This label canonly be usedwith a very limited classof verbs(usuallya semanticallyweakverbwith anoriginally directionalor local meaning,e.g. stellen,kom-men, etc. (‘to put’, ‘to come’)) thatoccurin connectionwith anequallylimited classof prepositions(mostlyzuandin (‘to’ and‘in’)). A typicalexamplefor this is theGermancollocationalexpressioninKraft treten(literal translation:‘to stepinto force’, meaning:‘to take effect’).
4.3 Expletives
In the TIGER corpus,finer distinctionswith regardto the usageof es, the Germanexpletive, havebeenintroduced.In theNEGRA annotationscheme,only onelabel (PH, meaningplace-holder)wasusedfor thenon-semanticusageof es; in theTIGERscheme,we distinguishthreetypes:
4 Vorfeld es: This typeof esis usedto fill thefirst positionof asentence,calledtheVorfeldslot. Itis markedwith thelabelPH (place-holder).Example:Esnahtein Gewitter (literal translation:‘it approachesa thunderstorm’;meaning:‘a thunderstormis approaching’).As soonasanothercomponentoccupiestheVorfeldslot, theesdisappears:Ein Gewitter naht.
4 Correlative es: This secondtypeof esis alwayscorrelatedto somepropositionalargumentinthesentence.It is usuallyoptional. Example:Mich freutes,dass... (‘it makesmehappy that...’). This typeis alsolabeledasPH but canbeeasilydistinguishedfrom thefirst typebecauseit always occursin connectionwith a propositionalsisternodefunctioning as RE (repeatedelement)(cf. Figure8).
4 Expletive es: Thelasttypeof esfunctionsasa non-thematicargument,e.g. in connectionwithweatherverbs:Heuteregnetes(‘today, it is raining’). It receivesthelabelEP (expletive).
0 1 2 3 4 5 6
500
501
502
Mich
PPER
me
freut
VVFIN5
makes_happy
es6PPER
it
,
$,7 dass
8KOUS
that9
er6PPER
he
kommt
VVFIN5comes:
CP SB HD
PH
S
RE
OA HD
NP
SB
S
Figure8: Correlative es
4.4 Proper nouns
In theNEGRA aswell asin theTIGER corpus,theparentlabelPN is usedto markordinarypropernouns,suchas‘GerhardSchroder’. Thesinglecomponentsreceive theedgelabelPNC (propernouncomponent). Furthermore,the label PN is also usedfor multi-token company names,newspapernames(e.g. ‘The SanFranciscoChronicle’)etc. In theTIGER annotationscheme,theusageof this
10
label wasextendedto cover titles of films, books,exhibitions etc. that have a complex, sometimessentence-like structure.Occurencesof thesephenomenaarefirst annotatedstructurallyandthenre-ceiveanadditionalunaryparentlabelPN. Theexamplesin Figure9 illustratethedifferentannotationsin NEGRA andTIGER.
0 1 2 3 4 5 6 7
500
501
502
Der;ART
the<
Film=NN
movie>"
$(? Einer
@PIS
oneAflogB
VVFIN
flewB
übersC
APPRART
over_theAKuckucksnestD
NN
cuckoo’s_nestE"
$(?
AC NK
SB HD
PP
MO
NK NK
S
NK
NP
0 1 2 3 4 5 6 7
500
501
503
502
DerFARTGtheH
FilmINNJ
movieK"
$(L Einer
MPISNoneO
flogP
VVFINQ
flewP
übersR
APPRARTGover_theO
KuckucksnestS
NNJ
cuckoo’s_nestT"
$(L
ACU
NK
SB HD
PP
MO
S
PNC
NK NK
PN
NK
NP
Figure9: Treatmentof structuredpropernounsin NEGRA (left) andTIGER (right)
Thus,theTIGERannotationpermitstheidentificationof structuresthatfunctionasnames,but donotfeaturethecorrespondingpart-of-speechtagNE (propernoun)in oneof their terminalnodes.
5 TIGERSearch
SyntacticallyannotatedcorporasuchastheTIGER treebankprovide a wealthof informationwhichcanonly beexploitedwith anadequatequerytool andquerylanguage.Thus,apowerful searchenginefor treebankshasbeendevelopedwithin theTIGERproject(Lezius,2002).
Thesearchengineis freely availablefor researchpurposesandcanbedownloadedfrom theTIGERwebsite. To supportall popularplatforms,thetool is implementedin Java. Input filters areprovidedfor many populartreebankformats.A completelist of supportedformatsisgivenin (Lezius,Biesinger,& Gerstenberger, 2002b). In addition,an interfaceformat, TIGER-XML (Mengel& Lezius,2000;Leziuset al., 2002a),hasbeendefined.This formatcanbeusedto import treebanksin otherformats,includingformatswhich have not yet beendeveloped.Thesearchengineis thusa tool which canbeusedby theentirecommunity.
5.1 Query Language
The query language(Konig & Lezius, 2002a,2002b)hasbeendesignedto fulfill two conflictingrequirements:On the one hand, it is closeto grammarformalisms,thus easyto learn. It allowsmodular, understandablecode,evenfor complex queries.A usercanposequeriesintuitively, mappinglinguisticdescriptionsdirectly into thequerylanguage.Ontheotherhand,its expressivenesshasbeenconstrainedto guaranteeefficient queryprocessing.
The querylanguageconsistsof threelevels. On the nodelevel, nodescanbe describedby Booleanexpressionsover feature-value pairs. The following query is matchedby the terminal node lacht(‘laughs’) in Figure4:
[word="lacht" & pos="VVFIN"]
11
On the noderelation level, descriptionsof two or more nodesare combinedby a relation. Sincegraphsaretwo-dimensionalobjects,we needonebasicrelationfor eachdimension.Theseareimme-diateprecedence(“.”) for thehorizontaldimensionandimmediatedominance(“>”) for theverticaldimension.6 Therearealsoderivednoderelationssuchasunderspecifieddominanceor siblings:
>* dominance(minimumpathlength1)>n dominancein V steps( VXWZY )>m,n dominancein [ steps( \^]Z[_]ZV )>L labeleddominance(edgelabelL)>@l leftmostterminalsuccessor(‘left corner’).* precedence(min. numberof intervals: 1)$ siblings
For example,thefollowing queryis matchedby asubgraphof Figure4 (theNP node):
[cat="NP"] >RC [cat="S"]
Finally, on the graph descriptionlevel we allow Booleanexpressionsover noderelations,withoutnegation.For example,asubgraphof Figure4 (thesecondS node)satisfiesthefollowing query:
([cat="S"] > [pos="PRELS"]) &([cat="S"] > [pos="VVFIN"])
Variablescanbe usedto expresscoreferenceof nodesor featurevalues.For example,the two nodedescriptions[cat="S"] in theabove querycould refer to differentnodes.A reformulationof thequeryusingvariablespreventsthis:
(#n:[cat="S"] > [pos="PRELS"]) &(#n > [pos="VVFIN"])
Thefollowing querydescribesa clausethat comprisesa personalpronounanda finite verb (cf. Fig-ure5).
#v:[cat="S"] &#t1:[pos="PPER"] & #t2:[pos="VVFIN"] &(#v > #t1) & (#v > #t2) & arity(#v,2)
In addition,theusercandefinetypehierarchies.Subtypesmayalsobeconstants,e.g.in thecaseofpart-of-speechsymbols.Hereis a partof a typehierarchyfor theSTTStagset:
nominal := noun,properNoun,pronoun.noun := "NN".properNoun := "NE".pronoun := "PPER","PDS","PRELS", ...
6Theprecedenceof two innernodesis definedastheprecedenceof their leftmostterminalsuccessors(Konig & Lezius,2000).
12
Figure10: Visualizationof queryresults
Thishierarchycanbeusedto formulatequeriesin amoreadequateway:
[pos=nominal] .* [pos="VVFIN"]
Therearealsoseveralusefulpredicatessuchasdiscontinuous(#n),continuous(#n) (phrasedoes/doesnot containcrossingbranches)or arity(#n,num) (phrasecomprisesVa`b\ children).Thefollowing examplequerydeterminesextraposedrelative clausesin theTIGER treebank(cf. Fig-ure4):
(#n:[cat="NP"] >RC [cat="S"]) &discontinuous(#n)
To simplify theformulationof morecomplex queries,templatescanbedefined.Thesearedescribedin (Konig & Lezius,2002a,2002b).
5.2 Query Tool
To ensureefficient queryprocessingwe have chosenanindexed-basedapproach.A corpus,encodedin a varietyof externalformats,e.g.,NEGRA/TIGERformat,bracketingformator anXML treebankformat(Mengel& Lezius,2000;Leziusetal., 2002a)is importedandindexed.Many partialsearchesareperformedduringindexing in orderto saveprocessingtimeduringqueryprocessing.Theindexingof a corpusis realizedin a tool calledTIGERRegistry, thecorpusquerytool is calledTIGERSearch.To increaseperformancewe have also implementedqueryoptimizationstrategiesandsearchspacefilters. Thequeryprocessingstrategy is describedin detail in (Lezius,2002).
In order to facilitate corpusexploration, a combinationof searchingfor phenomenaand browsingthroughacorpus,we have developedasophisticatedgraphicaluserinterfacefor theboththeTIGER-
13
Figure11: Graphicalqueryinput
Registry andtheTIGERSearchtool. We have alsodevelopedspecialcorpusvisualizationandqueryinput strategies.
TheTIGERSearchGUI comprisesagraphviewerto view thematchingsentencesof aquery. Figure10illustratesthevisualizationof acorpusgraphthatmatchestheexamplequeryabove. Userscanbrowsethroughthe forestof matchingsentencesusinga navigation bar andexport their favourite matches.Matchescanbe exportedin the TIGER XML format, but alsoasan interactive SVG image. Thus,matchforestscanbeviewedin a formatthatdoesnotdependon theTIGERSearchsoftwaresuite.
We have alsodevelopeda graphicalqueryinput front-endwhich enablesusersto ‘draw’ queriesin avery intuitive way (Voorman,2002). Queriesareexpressedby combiningnodesandnoderelations.Figure11 illustrateshow theexamplequeryabove canbeexpressedusingthegraphicalqueryeditor.
6 Summary and outlook
In this paper, we presentedthe TIGER treebank,the largestandmost comprehensive treebankforthe Germanlanguage.We explainedthe different levels of annotation:part-of-speechtags,phrasecategoriesandsyntacticfunctions.Furthermore,informationaboutlemmataandmorphologyis alsoencodedin the corpus. The methodsof annotation- interactive annotationwith AnnotateandLFGparsing- aswell asthe different representationformatsusedfor the TIGER treebankweredemon-strated.Wealsogave ashortoverview of relatedwork in comparabletreebankprojects.
Moreover, wealsodescribedtheTIGERannotationscheme,which is basedontheannotationschemeusedfor theNEGRAcorpus.Thepaperalsooutlinedthemostimportantextensionsin theTIGERan-notationscheme,which concerntheuseof secondaryedgesin coordinations,verb-subcategorization,finer distinctionsconcerningthe Germanexpletive esanda different treatmentof structuredpropernouns.
14
ThelastsectionpresentedTIGERSearch,aquerytool thatwasdevelopedin theprojectandwhichcanbe usedto exploit the TIGER treebankandseveral othertreebankformats. We explainedthe querylanguage,thatwasdesignedto poseintuitive queriesto thetreebank.We alsoshortly introducedthegraphicalqueryinputTIGERin.
Futurework will beconcernedwith theextensionof theTIGERtreebankto approximately80.000sen-tencesaltogetherandadditionalimprovementsin theannotationscheme.For instance,weinvision theintroductionof furtherdistinctionsconcerningverbalargumentsin orderto facilitatetheidentificationof thematicroles(Smith,2000).For thebeginningof 2003,a first releaseis plannedwhich will con-tain 10.000sentencescompletelyannotated(part-of-speechtagging,syntacticstructure,morphology,lemmata)accordingto thenew TIGER annotationschemeandthoroughlycheckedfor consistency.
All thetoolspresentedin thispaperarefreelyavailablefor researchpurposes.For furtherinformationon thecorpus,thecorpustoolsandhow to obtainthem,pleasereferto theprojectwebpage:
http://www.coli.uni-sb.de/cl/projects/tiger.
References
Abeille, A., Brants,T., & Uszkoreit, H. (Eds.). (2000). Proceedingsof the COLING-2000Post-ConferenceWorkshopon LinguisticallyInterpretedCorpora LINC-2000.Luxembourg.
Abeille,A., Clement,L., & Kinyon,A. (2000).Building a treebankfor French.In Proceedingsof theSecondInternationalConferenceon Language ResourcesandEvaluationLREC-2000(pp. 87– 94). Athens,Greece.
Boguslavsky, I., Grigorieva, S., Grigoriev, N., Kreidlin, L., & Frid, N. (2000). Dependency tree-bankfor Russian:Concept,tools, typesof information. In 18th InternationalConferenceonComputationalLinguisticsCOLING-2000.Saarbrucken,Germany.
Bosco,C., Lombardo,V., Vassallo,D., & Lesmo,L. (2000). Building a treebankfor Italian: A data-drivenannotationschema.In Proceedingsof theSecondInternationalConferenceonLanguageResourcesandEvaluationLREC-2000(pp.99– 106). Athens,Greece.
Brants,S.,& Hansen,S. (2002).Developmentsin theTIGERannotationschemeandtheir realizationin thecorpus.In Proceedingsof theThird Conferenceon Language ResourcesandEvaluationLREC-02.LasPalmasdeGranCanaria,Spain.
Brants,T. (1997). TheNEGRAExport Format (CLAUS ReportNo. 98). Saarbrucken, Germany:Dept.of ComputationalLinguistics,SaarlandUniversity.
Brants,T. (1999). Tagging and parsing with CascadedMarkov Models - automationof corpusannotation. Saarbrucken, Germany: GermanResearchCenterfor Artificial IntelligenceandSaarlandUniversity: Saarbrucken Dissertationsin ComputationalLinguistics and LanguageTechnology(Bd. 6).
Brants,T. (2000a). Inter-annotatoragreementfor a Germannewspapercorpus. In ProceedingsofSecondInternationalConferenceonLanguage ResourcesandEvaluationLREC-2000.Athens,Greece.
Brants,T. (2000b).TnT – A StatisticalPart-of-SpeechTagger. In Proceedingsof theSixthConferenceon AppliedNatural Language ProcessingANLP-2000.Seattle,WA.
15
Brants,T., Hendriks,R., Kramp, S., Krenn, B., Preis,C., Skut, W., & Uszkoreit, H. (1999). DasNEGRA-Annotationsschema(Tech.Rep.).Saarbrucken,Germany: Dept.of ComputationalLin-guistics,SaarlandUniversity.
Brants,T., & Skut,W. (1998). Automationof treebankannotation.In Proceedingsof New Methodsin Language ProcessingNeMLaP-98.Sydney, Australia.
Brants,T., Skut,W., & Uszkoreit,H. (1999).Syntacticannotationof aGermannewspapercorpus.InProceedingsof theATALATreebankWorkshop(pp.69–76).Paris,France.
Bresnan,J. (Ed.). (1982).TheMentalRepresentationof GrammaticalRelations.MIT Press.
Dipper, S. (2000). Grammar-basedCorpusAnnotation. In Proceedingsof LINC-2000(pp. 56–64).Luxembourg.
Frank,A., King, T. H., Kuhn,J.,& Maxwell, J. (1998). Optimality TheoryStyleConstraintRankingin Large-scaleLFG Grammars.In Proceedingsof theLFG98Conference. Brisbane,Australia:CSLI OnlinePublications,http://www-csli.stanford.edu/publications.
Greenbaum,S. (Ed.). (1996). ComparingEnglishworldwide: TheInternationalCorpusof English.Oxford,UK: ClarendonPress.
Hajic, J. (1999). Building a syntacticallyannotatedcorpus: The PragueDependency Treebank.In E. Hajicova (Ed.), Issuesof valencyand meaning. Studiesin honourof Jarmila Panevova.Prague,CzechRepublic:CharlesUniversityPress.
King, T. H., Dipper, S.,Frank,A., Kuhn,J.,& Maxwell, J. (To Appear).Ambiguity ManagementinGrammarWriting. Journalof Language andComputation. (Specialissue)
Konig, E., & Lezius, W. (2000). A descriptionlanguagefor syntacticallyannotatedcorpora. InProceedingsof COLING-2000(pp.1056–1060).Saarbrucken,Germany.
Konig,E.,& Lezius,W. (2002a).TheTIGERlanguage - A DescriptionLanguage for SyntaxGraphs.Part 1: User’s Guidelines.(Tech.Rep.). IMS, University of Stuttgart. (http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/papers/tigerLanguage.ps.gz)
Konig,E.,& Lezius,W. (2002b).TheTIGERlanguage - A DescriptionLanguage for SyntaxGraphs.Part 2: Formal Definition. (Tech.Rep.). IMS, University of Stuttgart. (http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERSearch/papers/tigerLangForm.ps.gz)
Kramp,S.,& Preis,C. (2000).Konventionenfur dieVerwendungdesSTTSim NEGRA-Korpus(Tech.Rep.).Saarbrucken,Germany: Dept.of ComputationalLinguistics,SaarlandUniversity.
Leech,G. (1992).TheLancasterParsedCorpus.ICAMEJournal, 16(124).
Lezius,W. (2001).Baumbanken. In K.-U. Carstensen,C. Ebert,C. Endriss,S.Jekat,R. Klabunde,&H. Langer(Eds.),ComputerlinguistikundSprachtechnologie - eineEinfuhrung(pp.377– 385).Heidelberg, Germany: SpektrumAkademischerVerlag.
Lezius,W. (2002).Ein WerkzeugzurSucheaufsyntaktisch annotiertenTextkorpora. IMS, Universityof Stuttgart.(PhDthesis,in preparation)
Lezius,W., Biesinger, H., & Gerstenberger, C. (2002a).TIGER-XMLQuick ReferenceGuide(Tech.Rep.).IMS, Universityof Stuttgart.
16
Lezius,W., Biesinger, H., & Gerstenberger, C. (2002b).TIGERRegistry Manual(Tech.Rep.). IMS,Universityof Stuttgart.
Lezius,W., & Konig, E. (2000). Towardsa searchenginefor syntacticallyannotatedcorpora. InProceedingsof theFifth KONVENSConference. Ilmenau,Germany.
Marcus,M., Kim, G., Marcinkiewicz, M., MacIntyre,R., Bies,A., Gerguson,M., Katz,K., & Schas-berger, B. (1994).ThePennTreebank:Annotatingpredicateargumentstructure.In Proceedingsof theARPA HumanLanguage Technology Workshop.SanFrancisco,CA: MorganKaufman.
Mengel,A., & Lezius,W. (2000). An XML-basedencodingformat for syntacticallyannotatedcor-pora. In Proceedingsof LREC-2000(pp.121–126).Athens,Greece.
Moreno,A., Grishman,R., Lopez,S., Sanchez,F., & Sekine,S. (2000). A treebankof Spanishandits applicationto parsing.In Proceedingsof theSecondInternationalConferenceon LanguageResourcesandEvaluationLREC-2000(pp.107– 112). Athens,Greece.
Oflazer, K., Hakkani-Tur, D., & Tur, G. (1999).Designfor aTurkishtreebank.In Proceedingsof theWorkshoponLinguisticallyInterpretedCorpora LINC-99. Bergen,Norway.
ParGram.(2002).TheParGramProject. (URL: http://www2.parc.com/istl/groups/nltt/pargram/)
Plaehn,O.,& Brants,T. (2000).Annotate- anefficient interactive annotationtool. In ProceedingsoftheSixthConferenceon AppliedNatural Language ProcessingANLP-2000.Seattle,WA.
Riezler, S.,King, T. H., Kaplan,R.,Crouch,R.,Maxwell, J.,& Johnson,M. (2002).ParsingtheWallStreetJournalusingaLexical-FunctionalGrammarandDiscriminative EstimationTechniques.In Proceedingsof theACL-02,.Philadephia,PA.
Sampson,G. (1995). Englishfor thecomputer. TheSUSANNEcorpusandanalyticscheme. Oxford,UK: ClarendonPress.
Schiller, A., Teufel,S.,& Stockert,C. (1999).Guidelinesfur dasTagging deutscher Textcorpora mitSTTS(Tech.Rep.).Universityof Stuttgart,Universityof Tubingen.
Schrader, B. (2001).ModifikationeinerdeutschenLFG-Grammatikfur Partial Parsing. Studienarbeit,Universityof Stuttgart.
Simov, K., Osenova, P., Slavcheva, M., Kolkovska, S., Balabanova, E., Doikoff, D., Ivanova, K.,Simov, A., & Kouylekov, M. (2002). Building a linguistically interpretedcorpusof Bulgarian:the BulTreeBank. In Proceedingsof Third InternationalConferenceon Language ResourcesandEvaluationLREC-2002(pp.1729–1736).LasPalmasdeGranCanaria,Spain.
Skut,W., Brants,T., Krenn,B., & Uszkoreit,H. (1998).A linguistically interpretedcorpusof Germannewspapertext. In Proceedingsof the Conferenceon Language Resourcesand EvaluationLREC-98(pp.705–711).Granade,Spain.
Skut,W., Krenn,B., Brants,T., & Uszkoreit, H. (1997). An annotationschemefor freeword orderlanguages.In Proceedingsof ANLP-97.Washington,D.C.
Smith,G. (2000).Encodingthematicrolesvia syntacticcategoriesin aGermantreebank.In Proceed-ingsof theWorkshoponSyntacticAnnotationof Electronic Corpora. Tubingen,Germany.
17
Smith,G.,& Eisenberg, P. (2000).KommentarezurVerwendungdesSTTSim NEGRA-Korpus(Tech.Rep.).Universityof Potsdam.
Uszkoreit,H., Brants,T., & Krenn,B. (Eds.).(1999).Proceedingsof theWorkshopon LinguisticallyInterpretedCorpora LINC-99. Bergen,Norway.
Voorman,H. (2002). TIGERin– Graphische Eingabevon Suchanfragen in TIGERSearch. IMS,Universityof Stuttgart.(DiplomaThesis,in preparation)
Wahlster, W. (Ed.). (2000). Verbmobil: Foundationsof Speech-to-Speech Translation. Heidelberg,Germany: Springer.
Zinsmeister, H., Kuhn,J., & Dipper, S. (2002). Utilizing LFG Parsesfor TreebankAnnotation. InProceedingsof theLFG02Conference. Athens,Greece.
18