the chaos project: theory and practice fabio massimo zanzotto department of computer science,...
TRANSCRIPT
![Page 1: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/1.jpg)
The CHAOS Project:Theory and Practice
Fabio Massimo ZanzottoDepartment of Computer Science, Systems and ProductionUniversity of Roma “Tor Vergata”
![Page 2: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/2.jpg)
People
INVESTIGATORS Roberto Basili Fabio Massimo Zanzotto Maria Teresa Pazienza
FORMER CONTRIBUTORS Daniele Pighin Daniele Previtali Alessandro Bahgat Marco Pennacchiotti Massimo Di Nanni Michele Vindigni Luigi Mazzucchelli Paola Velardi Paolo Zirilli Alessandro Cucchiarelli Alessandro Marziali Fabrizio Grisoli Gianluca De Rossi
![Page 3: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/3.jpg)
Outline
Theory: Customizable parsing architectures XDG: eXtended Dependency Graph
Task oriented parsing design Practice: System Implementation and Use
A component-based approach An object-oriented platform
Linguistic data Processing modules
How to use the parser in an application Demo!!!
![Page 4: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/4.jpg)
Theory
Customizable parsing architectures
![Page 5: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/5.jpg)
Motivation
The Chaos Project unofficially began in ’96 … on the long tradition of ARIOSTO (Basili, Pazienza, Velardi) @ the
University of Rome “Tor Vergata” (RTV) Aim
building robust parsers for Italian and for English that use verb sub-categorization (syntactic) lexicons induced from
corpora that can be used in applications
Constraints use the long tradition @ RTV
“Social” background Microtheories for microphenomena Language analysis can be reduced to a cascade of modules (e.g., FSA) Application-oriented language anaysis (e.g., IE) Robust (formely, shallow) parsing approaches
![Page 6: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/6.jpg)
Motivation
Inf(S2)
Inf(S1)
[ Mr. Gaubert ] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]
contribute-NP-PP(to)value-NP-PP(at)
![Page 7: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/7.jpg)
Motivation (found on vinyl supports)
Different NLP applications have different performance constraints in term of:
Accuracy Throughput
Customizable parsing architectures are reusable in different application scenarios if:
the architectural design supports performance control
![Page 8: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/8.jpg)
Customizable parsing architectures (found on vinyl supports)
Modularization clarifies the interdependency between
different syntactic information (grammatical/lexicalized)
allows to control throughput via eliciting modules quality via a clear relation between modules
(prerequisites/contributions)
![Page 9: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/9.jpg)
Modular approach
Syntactic parser SP(S,K)=I SP(S)=I
Syntactic parsing module:Pi(Si,Ki)=Si+1 Pi(Si)=Si+1
Modular syntactic parserSP = Pn... P2P1
![Page 10: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/10.jpg)
Modular approach
To push a modular approach we need:
a suitable annotation scheme a classification of the processing
modules
![Page 11: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/11.jpg)
A suitable annotation scheme
Requirements: Modularization
a stable representation of partially analyzed structures
Lexicalization a clear representation of the (semantic)
head of a given structure able to activate the lexicalized rule
![Page 12: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/12.jpg)
XDG: Extended Dependency Graph
XDG combines constituency and dependency based formalisms
XDG=(C,D)C = {(c,t,h)|cS,t,hc}D = {(c1,c2,t)| c1,c2C, t}
Nice property: allow to store persistent ambiguity (for interpretations projected by the same nodes)
![Page 13: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/13.jpg)
XDG: Extended Dependency Graph
C are constituents syntactic head potential semantic
governor D are dependencies
among constituents
![Page 14: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/14.jpg)
Classification of parsing modules
Pi(XDGi,Ki)=Pi(XDGi)=XDGi+1
The classification is performed according to: the type of information K used how they manipulate the sentence
representation
![Page 15: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/15.jpg)
Task oriented parsing design
Given: The NLP application requirements R The test-bed T A pool of parsing modules PM
The designing activity is: The research of a combination of the
parsing modules PM that fits R on the T
![Page 16: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/16.jpg)
NLP application requirements
Target phenomena: es. VP_PP, NP_PP, etc
Metrics: Recall R per sentence Precision P per sentence F-measure per sentence
![Page 17: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/17.jpg)
CHAOS: Levels of Analysis
POS
Chunks
Clauses
Dependencies
Strategies to use with questions you cannot answer
NNS TO VB IN NNS PRP MD VB
NPK VPK PPK NPK VPK
![Page 18: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/18.jpg)
Verb dependencies and Clause Boundaries
Inf(S2)
Inf(S1)
[ Mr. Gaubert ] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]
contribute-NP-PP(to)value-NP-PP(at)
![Page 19: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/19.jpg)
Verb dependencies and Clause Boundaries
Inf(S2)
Inf(S1)
[ Mr. Gaubert ] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]
contribute-NP-PP(to)value-NP-PP(at)
![Page 20: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/20.jpg)
Verb dependencies and Clause Boundaries
Inf(S2)
Inf(S1)
[ Mr. Gaubert ] [contributed] [real estate] [valued] [ at $ 25 million] [to the assets] [of Independent American]
contribute-NP-PP(to)value-NP-PP(at)
![Page 21: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/21.jpg)
Verb dependencies and Clause Boundaries
The algorithm: Initial Hypoteses:
Minimal boundaries of the clauses in the sentence
Derived Hierarchy
Until all verbs have not been analyzed: Take the rightmost not analyzed verb v:
Take the lexicalized rules R(v) for the verb v Find the dependencies of
Augment the clause boundaries
![Page 22: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/22.jpg)
Practice
System Implementation and Use
![Page 23: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/23.jpg)
A Computational Framework
Object-oriented backbone Objects for the different data Objects for the different sub-processes
Linguistic sub-processors as libraries Coexisting languages: Java, C++, C,
Prolog
![Page 24: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/24.jpg)
System implementation
A component-based approach An object-oriented platform
Linguistic data Textual entities: Text, Paragraphs XDG
Linguistic processors
![Page 25: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/25.jpg)
A Component-based Approach
Advantages: Computational efficiency Rapid prototyping Integration of different technologies Easy reuse
![Page 26: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/26.jpg)
Linguistic processors
![Page 27: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/27.jpg)
Linguistic processors
Tokenizer, Complex Tokenizer Dictionary lookup modules
Yellow page look-up Morphology analyzer
Name Entity Recognition Part-of-speech tagging Chunker Verb shallow analyzer Shallow analyzer
![Page 28: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/28.jpg)
Linguistic modules
Each process is encapsulated in an object initialize()
Load lexicons and rules (general or domain specific)
finalize() Dismiss the process rules and lexicons
run() Enrich the input with the contributes of the process
![Page 29: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/29.jpg)
Linguistic processors
Microtheories for microphenomena
Each processor implements its own theory: It has its language for describing rules It is written in its own programming language
![Page 30: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/30.jpg)
Processor: Yellow page look-up, Morphology analyzer
compra comprare d(a) v.tran.sempl 2.sing.imper.pres ~:u:~compra comprare d(a) v.tran.sempl 3.sing.ind.pres ~:u:~comprai comprare d(a) v.tran.sempl 1.sing.ind.pass_rem ~:u:~comprammo comprare d(a) v.tran.sempl 1.plur.ind.pass_rem ~:u:~compran comprare d(a) v.tran.sempl 3.plur.ind.pres ~:u:~comprando comprare d(a) v.tran.sempl geru.pres ~:u:~comprano comprare d(a) v.tran.sempl 3.plur.ind.pres ~:u:~
Dictionary
![Page 31: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/31.jpg)
Processor: Chunker
…constituent_class([_cst1, _cst2, _cst3], 'VerFin', _mor, 1, 3):-
verb_finite(_cst1),verb_to_have(_cst1),verb_past_particle(_cst2),verb_to_be(_cst2),verb_past_particle(_cst3),common_morfology(_cst1,_mor).
…
Rules
![Page 32: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/32.jpg)
Processor: Verb Shallow Analyser
…pattern(comprare,[
[(oggetto,Post),(per,Post)],[(oggetto,Post),(da,Post),(per,Post)],[(oggetto,Post),(a,Post),(per,Post)],[(oggetto,Post)]]).
pattern(comprendere,[[(oggetto,Post)],[],[(oggetto,Post)]]).pattern(comprimere,[[(oggetto,Post)],[(oggetto,Post)]]).pattern(compromettere,[[(con,Post)],[(oggetto,Post)]]).pattern(comunicare,[[],
[(con,Post)],[(a,Post)],[(oggetto,Post),(a,Post)],[(oggetto,Post)]]).
…
Sub-categorization lexicon
![Page 33: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/33.jpg)
Implemented Italian Shallow Grammar
Constituent Categories Part-of-Speech Tags Chunk Types
Dependency Categories Dependency Categories over Chunk
Types
![Page 34: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/34.jpg)
A survival user guide
Version stand-alone: chaosparser -h
Version client-server: chaosserver –h chaosclient –h
XDG editor and actual gui: choasgui
![Page 35: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/35.jpg)
Using CHAOS in applications
In JAVA applications:ConfigurationHandler.initialize();
ConfigurationHandler.parseKBPropFile(“LANGUAGE”,”KB”);
Parser ms = new Parser();
ms.initialize();
In Non-JAVA applications: Using one of the possible output forms:
XDG in Xml XDG in Prolog XDG in QLF (in prolog)
![Page 36: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/36.jpg)
Perspective
Building a statistical Italian parser Increasing the Itailan annotated
corpora Reusing existing corpora
TUT SITAL VIT
![Page 37: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/37.jpg)
Tools
XDG editor DEMO!!!!
Syntactic annotation transformer
![Page 38: The CHAOS Project: Theory and Practice Fabio Massimo Zanzotto Department of Computer Science, Systems and Production University of Roma Tor Vergata](https://reader033.vdocuments.site/reader033/viewer/2022051515/55145cb1550346494e8b56dd/html5/thumbnails/38.jpg)
People
INVESTIGATORS Roberto Basili Fabio Massimo Zanzotto Maria Teresa Pazienza
FORMER CONTRIBUTORS Daniele Pighin Daniele Previtali Alessandro Bahgat Marco Pennacchiotti Massimo Di Nanni Michele Vindigni Luigi Mazzucchelli Paola Velardi Paolo Zirilli Alessandro Cucchiarelli Alessandro Marziali Fabrizio Grisoli Gianluca De Rossi