![Page 1: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/1.jpg)
Example-Based Treebank Querying
Liesbeth AugustinusVincent Vandeghinste
Frank Van Eynde
CLARIN Sofia, 2012-10-28
![Page 2: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/2.jpg)
NEDERBOOMS• Exploitation of Dutch treebanks for research in
linguistics
• CLARIN-VL Project• Centre for Computational Linguistics (CCL)• Dutch Grammar and Language Use (NGTG)
• Goals:– User-friendly tools– Access to large data files
![Page 3: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/3.jpg)
NEDERBOOMS
How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?
![Page 4: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/4.jpg)
QUERYING LASSYExisting search tools:
• dtsearch (Kloosterman 2007)
• Dact (de Kok 2010)
stand-alone tools
query language: XPath= standard query language for xml trees
![Page 5: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/5.jpg)
QUERYING LASSYSome examples:
• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”
//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
![Page 6: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/6.jpg)
QUERYING LASSYSome examples:
• “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies”
//node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]
• “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken”
//node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]
![Page 7: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/7.jpg)
QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary
= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort
![Page 8: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/8.jpg)
QUERYING LASSYXPath– Not user-friendly– Knowledge of Alpino grammar necessary
= problematic for non-technical linguists• Verify theory through data with corpus or treebank examples• Time consuming, requires some effort
How to make interaction between computational linguistics and theoretical linguistics possible?
![Page 9: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/9.jpg)
GrETELGreedy Extraction of Trees for Empirical Linguistics
• Search tool based on example sentences
• Input = natural language
• No explicit knowledge of formal query language nor Alpino grammar required
• Bridge gap between descriptive and computational linguistics
• Available online http://nederbooms.ccl.kuleuven.be (optimised for Mozilla Firefox)
![Page 10: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/10.jpg)
GrETEL - online
![Page 11: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/11.jpg)
GrETEL - online
![Page 12: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/12.jpg)
GrETEL - input
Green versus red word order in Dutch
– green: past participle – auxiliary
De NAVO stelt dat ze er alles aan gedaan heeft
– red: auxiliary – past participleDe NAVO stelt dat ze er alles aan heeft gedaan
“The NATO claim that they have done everything in their power” (deredactie.be)
![Page 13: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/13.jpg)
GrETEL - input
![Page 14: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/14.jpg)
GrETEL - input
>> parsed with Alpino
![Page 15: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/15.jpg)
GrETEL - annotation
![Page 16: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/16.jpg)
GrETEL - annotation
>> info added to Alpino parse
![Page 17: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/17.jpg)
GrETEL – query info
![Page 18: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/18.jpg)
GrETEL – query infoinput example
![Page 19: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/19.jpg)
GrETEL – query infoinput example
Alpino parse
![Page 20: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/20.jpg)
GrETEL – query infoinput example
query treeAlpino parse
![Page 21: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/21.jpg)
GrETEL – query infoinput example
query treeAlpino parse
XPath query
![Page 22: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/22.jpg)
GrETEL – query infoinput example
query treeAlpino parse
XPath query
treebanks
![Page 23: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/23.jpg)
GrETEL – query tree
![Page 24: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/24.jpg)
GrETEL – query tree
>> subtree extraction
![Page 25: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/25.jpg)
GrETEL – query tree
query tree
![Page 26: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/26.jpg)
GrETEL – query tree
query tree >> XPath generator >>
![Page 27: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/27.jpg)
GrETEL – query tree
query tree
//node[@cat="ssub" andnode[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]
XPath expression>> XPath generator >>
![Page 28: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/28.jpg)
GrETEL – treebanks
![Page 29: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/29.jpg)
GrETEL – treebanks
LASSY Small in PostgreSQL database
>> no local installation required
![Page 30: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/30.jpg)
GrETEL – results
![Page 31: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/31.jpg)
GrETEL – results
>> (adapted) query
![Page 32: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/32.jpg)
GrETEL – results
>> (adapted) query
>> quantitative
information
![Page 33: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/33.jpg)
GrETEL – results
![Page 34: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/34.jpg)
GrETEL – results
>> tree viewer
![Page 35: Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28](https://reader036.vdocuments.site/reader036/viewer/2022070307/551a7235550346b52d8b4fd0/html5/thumbnails/35.jpg)
GrETEL – results
>> tree viewer
>> list of results