using the english resource grammar to extend fact extraction capabilities
DESCRIPTION
Using the English Resource Grammar to extend fact extraction capabilities. v1.1. David Mott, IBM UK Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology Ann Copestake, University of Cambridge. ITA Fall Meeting October 2013. Research Objectives. - PowerPoint PPT PresentationTRANSCRIPT
International Technology Alliancein
Network & Information Sciences
Using the English Resource Grammar to extend fact extraction
capabilities
Using the English Resource Grammar to extend fact extraction
capabilitiesv1.1v1.1
David Mott, IBM UK
Stephen Poteet, Anne Kao, Ping Xue, Boeing Research & Technology
Ann Copestake, University of Cambridge
ITA Fall MeetingOctober 2013
Research ObjectivesResearch Objectives
Extraction of facts in Controlled English from Natural Language documentsexpress the document in a formal but still readable wayextracted facts can be used to infer new information
Facilitate configuration of NL processing tools in CEhuman analyst can be more involved in the NL processing a common model of linguistics, grammar, and semantics
Provide rationale for linguistic and analytic processinghuman can better understand and review the reasoningfacilitate evaluation of the quality of the reasoning
We are not tasked with creating fundamental breakthroughs in the theory of
NL processing
otherdata
Referencedata
Supporting the analystSupporting the analyst
doc27doc27
doc27
CE Facts
Inference Rationale
Argumentation
Query
Analysts Conceptual Model
Assumptions
Uncertainty CE Tools
NLP
Requirements
ProductLinked data web
Structured data
CE Facts
The analyst does not have time to
read all the reports
Working ScenarioWorking Scenario
Imagine you are an analyst in a team, being asked to provide high value information about events on the ground
Based upon reports and background reference material:
You want to extract basic facts from these reports and to infer new information You want to have “new ideas” and implement this quickly without IT involvement You want to understand and review the collaborative reasoning of the team which
may contain differing skills
02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds
Source: SYNCOIN simulated reportsGraham, Rimland, & Hall. (2011). A COIN-inspired Synthetic Dataset
for Qualitative Evaluation of Hard and Soft Fusion Systems: Proc, 14th international conference on information fusion. Chicago, IL.
The state of the BPP11 researchThe state of the BPP11 research
We are using CE as the target language for expressing facts as the shared model of the concepts being expressed as the language to configure NL systems
• Detecting structures in phrases• Mapping language expressions to concepts
as the way to reveal reasoning performed by a collaborative team
Text Phrase structures
FactsGeneric Semantics
Domain Semantics
Controlled English Analysts Reasoning
High Value Facts
Motivation for using DELPH-IN linguisticsMotivation for using DELPH-IN linguistics
Collaborate with DELPH-IN consortium, to extend our NL and fact extraction capabilities ERG is a high-coverage, high-precision English grammar, developed over 20 years MRS is the representation of semantics PET parser is an efficient parser
Explore Controlled English as possible facilitator for the use of DELPH-IN linguistic resources Provide opportunity to research into deeper semantic processing
contribute to the NL research community
Typed Feature StructuresEnglish Resource
Grammar, Stanford
Linguistic Knowledge Builder, Cambridge
PET parserMinimal Recursion Semantics, Cambridge
Japanese, German, Norwegian, Thai, Chinese, Spanish, ...
Translation
Integrating CE and the ERGIntegrating CE and the ERG
Use ERG (and PET) to parse sentences and provide phrase structures Use MRS to express generic semantics Represent domain semantics in MRS, by extending generic semantics Research into the integration of domain semantics and linguistic processing
Text Phrase structures
FactsGeneric Semantics
Domain Semantics
Controlled English Analyst’s Reasoning
High Value Facts
ERG MRS?
Raw ERG system outputRaw ERG system output
PARSE TREE (syntax)
MRS (semantics)
We will turn this into CE
Defining the ERG lexicon in CEDefining the ERG lexicon in CE
Transformation between the ERG structures (Typed Feature Structures) and CE
there is a count noun named checkpoint_n1 that is written as the word |checkpoint| and is a form of the noun sense ‘_checkpoint_n_1_rel’.
checkpoint_n1 := n_-_c_le & [ ORTH < “checkpoint" >, SYNSEM [ LKEYS.KEYREL.PRED "_checkpoint_n_1_rel", PHON.ONSET con ] ].
The user has to define this linkIs this easier to
understand?
the noun sense ‘_checkpoint_n1_rel’ expresses the entity concept ‘checkpoint’.
Mapping between generic semantics and specific semantics
the noun sense ‘_carpet_n1_rel’ expresses the entity concept ‘carpet’.
Defining ERG grammar rules in CEDefining ERG grammar rules in CE
Subcomponents of phrase are
“head daughter” followed by “non head”
daughter
basic_head_initial := basic_binary_headed_phrase & [ HD-DTR #head, NH-DTR #non-head, ARGS < #head, #non-head > ].
there is a linguistic frame named f1 that defines the basic-head-initial PH and
has the sequence ( the sign A0 , and the sign A1 ) as subcomponents and
has the statement that ( the basic-head-initial PH has the sign A0 as HD-DTR and has the sign A1 as NH-DTR ) as semantics.
a basic-head-initial
ARGS
a list0TH
a sign
HD-DTR a thing
NH-DTR
a thing
1ST
a sign
Three stage approach to defining MRS in CEThree stage approach to defining MRS in CE
1. Generate raw representation of : elementary predications (EPs) as objects with predicate and arguments scope information between EPs features of the entities involved
2. Extract intermediate, but generic, concepts describing the raw MRS: patterns of quantification …
3. Transform into domain specific CE concepts using links between the predicate and the CE concept. …
Step 1 - CE version of raw MRSStep 1 - CE version of raw MRS
x5 – “I”
x9 – “new carpet”
x5 “needs” x9
Still needs to be turned into more understandable concepts…
if ( there is an indefinite quantification Q that is on the thing T and has the mrs predicate MRS as sense ) and ( the mrs predicate MRS expresses the entity concept EC )then( the thing T is an EC ).
the mrs elementary predication #ep7_3 is an instance of the mrs predicate ‘_udef_q_rel’ and has the thing x9_8 as zeroth argument.
there is an indefinite quantification named q2 that
is on the thing x9_8 and
has the mrs predicate “_carpet_n_1_rel” as sense.
the mrs elementary predication #ep7_5 is an instance of the mrs predicate '_carpet_n_1_rel’ and has the thing x9_8 as zeroth argument.
the mrs predicate “_carpet_n_1_rel”
expresses
the entity concept ‘carpet’.
the thing x9_8 is a carpet.
the mrs elementary predication #ep7_3 equals modulo quantifiers the mrs elementary predication #ep7_5.
rule to detect quantifier pattern in MRS
Raw
Intermediate
Domain
3 Steps to Domain Semantics3 Steps to Domain Semantics
Facts extracted from example sentenceFacts extracted from example sentence
02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an
unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller
stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver
said, “I will see what I can do.” The call lasted 15 seconds
If other reports can add to information on the man x5_8 then we may know who is requiring new carpets, and could predict
future events?
This requires a number of linguistic and domain specific steps
DiscussionDiscussion
DELPH-IN community have developed excellent Natural Language capabilities We are integrating the “ERG system” and expressing lexicon, grammar rules and
semantics in CE However in the ERG system, the semantics are not completely separated from
the linguistic structures we propose intermediate semantic structures in CE, for bridging gap
between generic and domain semantics We are introducing domain semantics to represent facts in CE
provides a “target” for output of the ERG system opportunity to explore how this can affect parsing of sentences
Much needs to to be done improve integration extend intermediate MRS obtain rationale feedback of semantic reasoning into the parsing mechanisms to help adding/understanding of rules
ExtraExtra
ERG rules & types ERG lexicon
PET parserText MRS
CE lexicon
Conceptual model
shallow processing
CE facts
PET parse tree
Parse tree as
CEStanford Parser
Raw MRS as
CE
Use same transformation
to be consistent
CE linguistic frames
Information FlowInformation Flow
Red links have been partially implemented
RationaleRationale
“the group of things x10 has the entity concept survey as categorisation.”
The rationale from the elementary predicates is:
How do we get the rationale FOR the elementary predicates? could follow the parser tree + the TFS definitions, but nee a link between parse tree and
MRS, which is so far not available
A layered Conceptual ModelA layered Conceptual Model
Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model
belongs to, has as domain
Semiotics Thing, Meaning, Symbol stands for, expresses
General Semantics
Agent, Spatial Entity, Temporal Entity, Situation, Container
has as agent role, is contained in
Linguistic Sentence, Phrase, Word, Noun, Fragment, Linguistic Frame
has as dependent, is parsed from, expresses
Analysts Domain Model
Place, Person, Village, Communication, IED, Facility, ....
is located in, monitors
Our Semiotic Triangle, based on [Ogden, C. K. and Richards, I. A. (1923). ]
The ERG system architectureThe ERG system architecture
PET is run under Linux (DEBIAN) in an ORACLE VirtualBox image A Prolog program provides a web service for parsing sentences and
turning the result into CE Aiming to integrate to our CE Store
sentenceCE parse tree and
MRS
PET parser with ERG PROLOG CE generator
PROLOG web service
sentenceparse tree and MRS
CEparse tree and MRS
Feedback of domain reasoning to the parsing?Feedback of domain reasoning to the parsing?
We want the domain to affect the parse, eg: creating new lexical entries and grammar rules prior to parsing
But we also want arbitrary domain reasoning to affect the parse at runtime
Could this: rule out inconsistent parses provide disambiguations, and dialog context?
ERG/PETDOMAIN
REASONER
facts
constraints on linguistic phenomena
ERG DOMAIN MODELlexical entries,
grammar rules
Linking text to domain situationsLinking text to domain situations
Working out the “requirer”Working out the “requirer”
This can only be done by analysis of the communications as a whole (including anaphoric reference)
02/03/10 - ET: 0855hrs -- Cell phone call from unidentified male (7115452376) in Bayaa to an unidentified male (7438604901) in Saydiyah //MGRSCOOR: 38S MB 37 77//. The caller stated: “I will need new carpet for my house.” The receiver asked: “How big is the house?” The reply was: “I have a large family.” The receiver said, “I will see what I can do.” The call lasted 15 seconds
STEP CSTEP A
• Step C needs knowledge of the structure of the report and of communications• Step A needs linguistic knowledge
Example CE rulesExample CE rules
if
( the communication C has the agent A as initiator ) and
( the agent A is located in the place P )
then
( the communication C is from the place P ).
if
( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel'
and has the thing T as first argument
and has the thing C as second argument )
then
( the thing T is contained in the container C ).
DOMAIN RULE
LINGUISTIC RULE
Domain SituationsDomain Situations
a requirement
a production
a delivery
a usage
an agent
an agent
an agent
an agent
the materialhas as material
is requested by
is requested from
has
as m
ater
ial
is produced by
is delivered by
is delivered to an agent
has as material
has
as m
ater
ial
an agent
is performed by
needs
are these the same agent?
CE representation for parse treeCE representation for parse tree
Defining ERG grammar rules in CEDefining ERG grammar rules in CE
basic_head_initial := basic_binary_headed_phrase &
[ HD-DTR #head,
NH-DTR #non-head,
ARGS < #head, #non-head > ].
headed_phrase := phrase &
[ SYNSEM.LOCAL [
CAT [ HEAD head & #head, HC-LEX #hclex ],
AGR #agr,CONJ #conj ],
HD-DTR.SYNSEM.LOCAL local & [
CAT [ HEAD #head, HC-LEX #hclex ],
AGR #agr,CONJ #conj ] ].
Ordered sequence of
subcomponents, Head daughter
followed by non head daughter
Some info is passed up from head daughter
to “this” phrase
Analysis of the rules for hd_cmp_u_c
Example CE rulesExample CE rules
if
( the communication C has the agent A as initiator ) and
( the agent A is located in the place P )
then
( the communication C is from the place P ).
if
( the mrs elementary predication EP is an instance of the mrs predicate '_in_p_rel'
and has the thing T as first argument
and has the thing C as second argument )
then
( the thing T is contained in the container C ).
DOMAIN RULE
LINGUISTIC RULE
Calling ERG system from WordCalling ERG system from Word