multi-layered xml-based annotation for integrated nl processing
DESCRIPTION
Multi-layered XML-based Annotation for Integrated NL Processing. Anette Frank Language Technology Lab DFKI GmbH Saarbr ücken, Germany. Japanese-German Workshop on NLP, Sapporo, Japan July 4-5, 2003. Background. - PowerPoint PPT PresentationTRANSCRIPT
Multi-layered XML-based Annotation for Integrated NL Processing
Anette FrankLanguage Technology Lab
DFKI GmbHSaarbrücken, Germany
Japanese-German Workshop on NLP, Sapporo, Japan July 4-5, 2003
Background
Whiteboard – Multilevel Annotation for Dynamic Free Text Processing
H.Uszkoreit, B.Crysmann, A.Frank, B.Kiefer, G.Neumann, J.Piskorski, U.Schäfer, F.Xu,(M.Becker, and H.-U.Krieger)
Major project goals
• Integration of shallow and deep linguistic processing– Processing of unrestricted free text– Variable-depth text analysis
• XML-based system architecture– Uniform way of representing and combining results of various NLP components– Flexible software infra-structure for NLP-based applications
• Applications– Grammar & controlled language checking– Intelligent information extraction
DNLP
Fine-grained analysis
High precision • if correctly disambiguated
High ambiguity rates
Insufficient robustness• coverage• ill-formeded input
Insuffiencent efficiency
Goal of integrated ‘hybrid’ processing
Robustness and efficiency of shallow analysis Precision and fine-grainedness of deep syntactic analysis
SNLP
Partial analysis
Insufficient precision
Tamed ambiguity
High robustness• coverage• ill-formed input
High efficiency
Motivation — Annotation-based Integration of Shallow and Deep NLP —
Whiteboard Annotation Machine & Transformer (Schäfer 2003)
• Managing shallow and deep analyses in multi-layer XML architecture
• XSLT queries to XML standoff annotations for flexible, efficient integration
Lexical Integration (Crysmann et al. 2002)
• SPPC-HPSG interface: building HPSG lexicon entries “on the fly”– Named entities, open class categories (nouns, adjectives, adverbs, ..)
• HPSG-GermanNet integration – association with HPSG lexical sorts
coverage and robustness
Phrasal integration for ‘hybrid’ syntactic processing (Frank et al. 2003)
• Integration of shallow topological field parsing and deep HPSG parsing
efficiency and robustness
Integration of Shallow and Deep Analysisin WHAT: an XML-based Annotation Architecture
Integration of Shallow and Deep NLP — XML/XSLT-based system architecture —
• Multi-layer XML standoff annotation for integration of NLP components
– Standoff annotation allows for combination of overlapping hierarchies– Access to results of alternative NLP components, for flexible use in applications
• XSLT-based system architecture WHAT: Whiteboard Annotation Transformer (Schäfer 2003)
deep NLPcomponents
shallowNLP
components
XMLstandoff
annotation
multilayerchart
programminginterface
NLP-basedapplication
WHAM
WHAT
• Multi-layer XML standoff annotation for integration of NLP components
– Standoff annotation allows for combination of overlapping hierarchies– Access to results of alternative NLP components, for flexible use in applications
• XSLT-based system architecture WHAT: Whiteboard Annotation Transformer (Schäfer 2003)
XML standoffmarkup
component-specific XSLT
template library
constructed XSLT query
query
result
XSLTprocessor
– XSLT queries to XML standoff markup
Template library for 3 types of queries: V(alue), N(ode sets), D(ocument)
– Flexible, efficient access for online / offline integration of NLP components
ACT: Accessing, Computing, Transforming
– Portability
WHAT
Integration of Shallow and Deep NLP — XML/XSLT-based system architecture —
Integration of Shallow and Deep NLP — XSLT-based queries for annotation-based integration —
• Through V(alue) and N(ode) queries:– Morphology and stemming of unknown words (unknown in HPSG lexicon)– PoS tagging– Compounds– Named entities (spans and semantic types)
• Through D(ocument), V(alue) and N(ode) queries:– Chunks– Topological structure (spans, types)
• Example: return Named Entity type from SPPC_XML: getValue.NE.type(I4)
<query name="getValue.NE.type"> <!-- returns the type of named entity --> <xsl:param name="index"/> <xsl:template match="/WHITEBOARD/SPPC_XML//NE[@id=$index]"> <xsl:value-of select="@type"/> </xsl:template> </query>
Integration of Shallow and Deep NLP — Lexical integration —
• Building HPSG lexicon entries “on the fly”
– XML-encoding of typed feature structures
– Mapping lexical information from SPPC to HPSG typed feature structures
• Lexical syntactic and semantic information
– Mapping GermaNet semantic classes to HPSG sorts (Siegel et al., 2001)
– Subcategorisation acquisition from parsed corpora
Increase of coverage and robustness at lexical level
– Increase of fully lexically covered sentences: 43% (on NEGRA corpus)– Increase of parsed sentences due to lexical coverage: 8,9%
Shallow processing (SPPC)– Morphological and compound analysis
– PoS tagging
– Named Entity recognition
Deep syntactic processing (HPSG)– Subcategorisation
– Argument structure
– Lexical semantic sorts
Integration of Shallow and Deep NLP — Syntactic integration —
• Using robust, efficient shallow parsing to pre-partition deep parser‘s search space efficiency to select partial analyses from deep parser’s chart robustness
• Constraining the search space of a chart-based parser • External knowledge sources deliver
– compatible subtrees to be checked for compatibility with deep parsing
– additional information (categorial, featural constraints) for constituents
• Prioritisation scheme: constituents (chart edges) of deep parser are rewarded if compatible penalised if incompatible with external constraints
• Best-first filter on ambiguous output
Challenge: shallow analysis needs to provide reliable, compatible structures
The Shallow-Deep Mapping Problem — Problems and Solutions —
The shallow-deep mapping problem• Chunk parsing not isomorphic to deep syntactic structure („attachments“)
NP CL CLNP & CL chunks
[Die Programme [die [sie] benutzen, [um [ihre Ergebnisse] zu verbreiten][The programs [that [they] use, [in order to [their results] distribute]
CLCL
NP
Deep syntactic structure
The Shallow-Deep Mapping Problem — Problems and Solutions —
The shallow-deep mapping problem• Chunk parsing not isomorphic to deep syntactic structure („attachments“)
[Die Programme [die [sie] benutzen, [um [ihre Ergebnisse] zu verbreiten][The programs [that [they] use, [in order to [their results] distribute]
CLCL
NP
Deep syntactic structure
NP NPNP & CL chunksCL
The Shallow-Deep Mapping Problem — Problems and Solutions —
The shallow-deep mapping problem• Chunk parsing not isomorphic to deep syntactic structure („attachments“)
[Die Programme [die [sie] benutzen, [um [ihre Ergebnisse] zu verbreiten][The programs [that [they] use, [in order to [their results] distribute]
CLCL
NP
Deep syntactic structure
NP NPNPNP & CL chunks
The Shallow-Deep Mapping Problem — Problems and Solutions —
The shallow-deep mapping problem• Chunk parsing not isomorphic to deep syntactic structure („attachments“)
• „Bottom-up“ chunk parsing not constrained by sentence macro-structure
Peter eats pizza and Mary drinks wine
CL CL
CL
Stochastic Topological Field Parsing (Becker and Frank 2002)
• High degree of compatibility with deep syntactic structure
• Flat, partial macro-structure: robustness, coverage, efficiency, precision
NP
Stochastic Topological Field Parsing — Topological field model of German syntax —
Theory-neutral macro-structure of complex sentences
sentence Vorfeld Left sentence Mittelfeld (MF) Right sentence Nachfeldtype (VF) bracket (LK) bracket (RK) (NF)
V2 Fritz kennt die Freunde seines Sohns , die zur Party kommen. Fritz hat die Freunde seines Sohns kennengelernt , die zur Party kommen.
V1 Hat Fritz die Freunde s. Sohns kennengelernt , die zur Party kamen? Kennt Fritz die Freunde s. Sohns , die zur Party kommen?Vletzt weil Fritz die Freunde s.Sohns kennt wer die Freunde seines Sohns kennt , die zur Party kommen
topological structure
LK
VF
CL
NFRKMFRK
VF
MF
NF
LK mapping topological to deep syntactic structure
Stochastic Topological Field Parsing — A corpus-based approach (Becker & Frank 2002) —
Non-lexicalised PCFG trained from (converted) NEGRA corpus
• Flat phrasal fields VF, MF, NF: sequences of POS-tags (and CL-nodes)• Parameterised categories: CL–V2/–V1/–SUBCL/–REL/–WH,.. RB–INF/–FIN
• Explicit clausal embedding structure
PRFsich
himself
VVPPversteckt
hidden
VVFINhält
keeps
PRELSder
who
VF-TOPIC LB-VFIN
RB-VINF
VF-REL
RB-PTK
ADVDaher
thus
MF NF
VVFINwies
ordered
NESouza
S.
NNPolizei
police
ARTdie
the
PTKVZan
particle MF$, NF
ARTden
the
NNHäuptling
chieftain
VVINFfassen
capture
PTKZUzu
toMF RB-VFIN$,
CL-INF
CL-REL
CL-V2
Stochastic Topological Field Parsing — Performance —
Best model [para+, bin+, pnct+, prun +]
• High accuracy (93% / 88% ) at high coverage (up to 100% )
• High rate of perfect matches (fully correct) 80% / 72%
• Efficiency: 0.12 secs/sentence (LoPar parser, Schmid 2000)
PoS inputcove-rage
perfect
match
LP
in %
LR
in %
0CB
in %
2CB
in %
perfect 100.0 80.4 93.4 92.9 92.9 98.9
TnT tagger 99.8 72.1 88.3 88.2 87.8 97.9
Evaluation: ignoring parameters and punctuation (length 40 words)
Integrated Shallow and Deep Parsing— TopP meets HPSG —
VF-TOPIC
VAPPgewesen,12
VAFINwäre,13
LB-VFIN
MF
RB-VPART
ARTDer,1
MF NF
VAFINhätte,3
ARTeine,4
NNDimension,6
ADJAandere,5
VAPP gehabt,7
LB-COMPL
KOUSwenn,9
PROAVdabei,11
PPERer,10
RB-VFIN
CL-SUBCL
CL-V2
NNZehnkampf,2
NP-NOM S/NP-NOM
DDer,1
Vhätte,3
ARTeine,4 N’
Dimension,6AP-ATTandere,5
EPS/NP-NOM
S
N’Zehnkampf,2
NP-ACC EPS
V gehabt,7
N’ EPS
Vgewesen,12
V-LEwäre,13
SCwenn,9
VNP-NOMer,10
CP-MOD
PPdabei,11
V
Integrated Shallow and Deep Parsing— Bridging structural non-isomorphisms —
VF-TOPIC
VAPPgewesen,12
VAFINwäre,13
LB-VFIN
MF
RB-VPART
ARTDer,1
MF NF
VAFINhätte,3
ARTeine,4
NNDimension,6
ADJAandere,5
VAPP gehabt,7
LB-COMPL
KOUSwenn,9
PROAVdabei,11
PPERer,10
RB-VFIN
CL-SUBCL
CL-V2
NNZehnkampf,2
NP-NOM S/NP-NOM
DDer,1
Vhätte,3
ARTeine,4 N’
Dimension,6AP-ATTandere,5
EPS/NP-NOM
S
N’Zehnkampf,2
NP-ACC EPS
V gehabt,7
N’ EPS
Vgewesen,12
V-LEwäre,13
SCwenn,9
VNP-NOMer,10
CP-MOD
PPdabei,11
V
MAP_CONSTR id="T10" constr="extrapos_rk+nf" left="W7" right="W13"/
XSLT-based extractionof map constraints
to guide deep parsing
Flattening phrasal fields
Integrated Shallow and Deep Parsing — XML/XSLT-based integration: TopP meets HPSG —
chunk insertion
Integrated Shallow and Deep Parsing — XML/XSLT-based integration: TopP meets HPSG —
Shallow lexicalprocessing
SPPC
HPSG Parsing (prioritisation)
Integrated Shallow and Deep Parsing — XML/XSLT-based integration: TopP meets HPSG —
bracket extraction
<TOPO2HPSG type="root" id="5608"> <MAP_CONSTR id="T1" constr="v2_cp" left="W1" right="W13"/> <MAP_CONSTR id="T2" constr="v2_vf" left="W1" right="W2"/> <MAP_CONSTR id="T3" constr="vfronted_vfin+rk" left="W3" right="W3"/> <MAP_CONSTR id="T4" constr="vfronted_vfin+vp+rk" left="W3" right="W13"/> <MAP_CONSTR id="T5" constr="vfronted_vp+rk" left="W4" right="W13"/> <MAP_CONSTR id="T6" constr="vfronted_rk-complex" left="W7" right="W7"/> <MAP_CONSTR id="T7" constr="vl_cpfin_compl" left="W9" right="W13"/> <MAP_CONSTR id="T8" constr="vl_compl_vp" left="W10" right="W13"/> <MAP_CONSTR id="T9" constr="vl_rk_fin+complex+f" left="W12" right="W13"/> <MAP_CONSTR id="T10" constr="extrapos_rk+nf" left="W7" right="W13"/></TOPO2HPSG>
Shaping the Deep Parser’s Search Space — Bracket conditions from shallow topological parsing —
• Interface to shallow components: labelled brackets– Provide information about constituent start and end positions– Bracket names (types) associated with additional constraints
• HPSG parser PET: Agenda-based chart parser– Flexible priority heuristics for the parsing tasks (i.e. possible
combination of edges)– Matching start, connecting and end position of new tasks against
brackets
• Bracket information is used to modify task priorities– Reward tasks consistent with bracket information– Penalize tasks building incompatible chart edges– No pruning, but shaping the search space !
Crossing Event
Shaping the Deep Parser’s Search Space — Matching brackets and chart edges —
Bracketx
Match Event
Bracketx
Shaping the Deep Parser’s Search Space — Matching brackets and chart edges —
Right (Left)-match Inside Event
Bracketx
Shaping the Deep Parser’s Search Space — Matching brackets and chart edges —
Right (Left)-match Outside Event
Bracketx
Shaping the Deep Parser’s Search Space — Matching brackets and chart edges —
Shaping the Deep Parser’s Search Space — Conditions and Effects —
• Additional constaints on bracket types for prioritisation
– Constituent matching conditions• „Match“ and „Cross“: brackets compatible with HPSG constituents• „Right Inside“ and „Right Outside“: partially specified constituents
– HPSG grammar constraints• Allowed/Disallowed HPSG grammar rules• Necessary/Forbidden HPSG Feature Structure configurations
– Positive vs. negative priority effects: rewarding vs. penalising
• Changing priorities
– If both match conditions and grammar constraints are fulfilled
– Confidence values can be used to modulate the strength of the effect
˜p(t) = p(t) ( 1 ± confent(brx) confpr (x) (x) )
Confidence Measures — Accuracy of map-constraints —
• Static confidence measure: precision of bracket type x: confpr (x) – Precision/recall of brackets extracted from best topological parse
measured against brackets extracted from evaluation corpus (Becker&Frank 2002)
precision: 88.3%, recall 87.8%
34 bracket types prec 90% prec 80% prec 50%
avg. precision 93.1 88.9 41.26
% of bracket mass 53.5 77.7 2.7
% of bracket types 26.5 50 20.6
– Threshold pr = 0.7 excludes 22.8% of bracket mass, 32.35% of bracket types
includes chunk-brackets (with 71.1% precision)
Confidence Measures — Tree Entropy —
• Experiment
I. Effect of varying entropy thresholds on prec/recall of topological parsing
precision: proportion of selected parses that are perfect matches
recall: proportion of perfect matches that are selected
coverage:perfect matches above/below entropy threshold: in/out of coverage
II. Determining optimal entropy threshold, trading coverage for precision
Uniform distribution, high entropy very uncertain
Spike distribution, low entropy very certain
• Entropy of a parse distribution delivers a measure of how certain the parser is about its best analysis for a given sentence (e.g. Hwa 2000)
• Confent : Tree entropy as a confidence measure for the quality of the best topological parse and extracted bracket constraints
Optimal entropy threshold: ent = 0.236 maximising f-measure ( = 0.5) on training set
perf.match in %
LP in %
LR in %
coverage in %
ent = 1 70 87.6 87.8 100
ent = .236 80.5 93.3 91.2 80.6
Effect of threshold ent = 0.236 on test set
Experiments carried out on (split) evaluationcorpus of (Becker and Frank, 2002)
Varying entropy thresholds [1,0] • ent = 1 no filtering • lowering ent increases precision, decreases recall and coverage
Confidence Measures — Tree Entropy —
Experiments— Data and setup —
Data• 5060 NEGRA sents (24.57% of NEGRA corpus as covered by HPSG)
length: 8.94 w/o punct ; lex. ambiguity: 3.05 entries/word
Setup• Performance measuring (absolute run-time, no. of tasks)
• Baseline : HPSG parsing w/ PoS guidance, but w/o topological information
• Testing various integration parameters
topological brackets confidence weights for topological information
• bracket precision (P) (± thresholded)
• tree entropy (E) (± thresholded)
chunk brackets
topological brackets confidence weights for topological information
• bracket precision (P) (± thresholded)
• tree entropy (E) (± thresholded)
chunk brackets
Results
–P –E ½
310 2353 2.17
–P –E 1 320 2377 2.10
+P –E ½
306 2288 2.21
PT –E ½
294 2268 2.30
–P +E ½
302 2330 2.23
–P ET ½
337 2503 2.00
upper-b msec (1st) tasks factor
baseline 675 4749 —
Integration of topological brackets
PT with topological & chunk brackets
PT –E ½
312 2379 2.16
PT –E ½
611 4234 1.10
PT with chunk brackets
Heuristic weights on task priorities• ½ : increase / decrease by half • 1 : incease to double / decrease to zero
Baseline: HPSG-parsing w/ PoS guidance
Results
–P –E ½
310 2353 2.17
–P –E 1 320 2377 2.10
+P –E ½
306 2288 2.21
PT –E ½
294 2268 2.30
–P +E ½
302 2330 2.23
–P ET ½
337 2503 2.00
upper-b msec (1st) tasks factor
baseline 675 4749 —
Integration of topological brackets
PT with topological & chunk brackets
PT –E ½
312 2379 2.16
PT –E ½
611 4234 1.10
PT with chunk brackets
Confidence weights [0,1]P(T): (thresholded) bracket precisionE(T): (thresholded) tree entropy
Baseline: HPSG-parsing w/ PoS guidance
Heuristic weightswith set high, wrong topological information can mislead the parser
Results
–P –E ½
310 2353 2.17
–P –E 1 320 2377 2.10
+P –E ½
306 2288 2.21
PT –E ½
294 2268 2.30
–P +E ½
302 2330 2.23
–P ET ½
337 2503 2.00
upper-b msec (1st) tasks factor
baseline 675 4749 —
Integration of topological brackets
PT with topological & chunk brackets
PT –E ½
312 2379 2.16
PT –E ½
611 4234 1.10
PT with chunk brackets
Baseline: HPSG-parsing w/ PoS guidance
PT with chunk constraints w/ and w/o topological brackets
Heuristic weightswith set high, wrong topological information can mislead the parser
Confidence weights• PT and E work best• ET: threshold cuts out entire tree, while some brackets can be correct
Results
–P –E ½
310 2353 2.17
–P –E 1 320 2377 2.10
+P –E ½
306 2288 2.21
PT –E ½
294 2268 2.30
–P +E ½
302 2330 2.23
–P ET ½
337 2503 2.00
upper-b msec (1st) tasks factor
baseline 675 4749 —
Integration of topological brackets
PT with topological & chunk brackets
PT –E ½
312 2379 2.16
PT –E ½
611 4234 1.10
PT with chunk brackets
Baseline: HPSG-parsing w/ PoS guidance
Heuristic weightswith set high, wrong topological information can mislead the parser
Confidence weights• PT and E work best• ET: threshold cuts out entire tree, while some brackets can be correct
• No improvement by adding chunks • Chunks w/o topological brackets: almost no improvement over BL
Observations — Monitoring efficiency gains by sentence length —
Efficiency gains/losses by sentence length
Baseline vs. PT –E ½
Distribution: # sentences / sentence length
Outliers• 963 sents (len 3, len 11.09)
Observation• conflicting topological / HPSG parses• cross-validation effects
Impact of guidance by PoS, chunks, or topological parsing
• Baseline includes PoS-prioritisation
• Chunk-based constraints rather poor
• Topological constraints (span and grammar constraints): highest impact
Related work : Daum et al. 2003
PoS- and chunk-based prioritisation in dependency parsing
Observations — Guidance from PoS, chunks, and topological brackets —
our work Daum et al. 2003
–PoS +PoS 1.13 2.26
+PoS +PoS +Ch 1.1 1.21
+PoS +PoS +TopP 2.3 n.a.
• Data-driven integration of shallow and deep parsing, mediated by XML multi-layer annotation architecture– XSLT-based integration:
efficient, fine-grained dovetailing of shallow and deep constraints
– Shallow macro-structural constraints yield substantial performance gains
– Focus on annotation-based system architecture and efficiency
• Further integration scenarios target– Robustness
• Topological information for fragment recovery from deep parser’s chart • Pruning failed input sentences for reparsing (snipping adjunct clauses, ...)
– Precision• Confidence-based filtering: tree entropy, decision tree learning
– Fine-grainedness of analysis• Projecting robust semantic structures from shallow trees
Conclusion and Outlook