multilingual event extraction and semi-automatic acquisition of related resources

22
Multilingual Event Extraction and Semi-automatic Acquisition of Related Resources Hristo Tanev Joint Research Centre Ispra, Italy

Upload: htanev

Post on 09-Jul-2015

615 views

Category:

Documents


2 download

DESCRIPTION

How to create a multilingual event extraction system

TRANSCRIPT

Page 1: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Multilingual Event Extraction and Semi-automatic Acquisition of Related

Resources

Hristo TanevJoint Research Centre

Ispra, Italy

Page 2: Multilingual Event Extraction and Semi-automatic acquisition of related resources

NEXUS News Event eXtraction

Using language Structures

Page 3: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Event Extraction

Event extraction was introduced as a language processing task at MUC-2 in 1989

Event is something that happens, event description is a template which describes an event

The goal of automatic event extraction is automatic filling of an event description template from a text or a set of texts

Event description usually includes: Event type Time and place of the event Participating entities which have specific roles and which depend on the event type,

e.g. perpetrator, victim, instrument etc. Cause

Page 4: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Event Extraction in the Context of EMM

The purpose of the automatic event extraction from online news is to facilitate the crisis-management efforts of the European Commission and other related political institutions

NEXUS NEXUS detects security-related events and disasters NEXUSNEXUS monitors in nearly real time online news in English,

French, Spanish, Italian, Russian, Portuguese, and Arabic (after automatic translation into English)

Medical NEXUS detects news about disease outbreaks in English (soon to be deployed in French)

Page 5: Multilingual Event Extraction and Semi-automatic acquisition of related resources

EMM Event Extraction from Online News

News cluster:

Car bomb kills 50 in IraqHindustanTimes Wednesday, June 18, 2008 5:07:00 AM CEST A car bomb blast in northern Baghdad left more than 50 people dead and 80 wounded on Tuesday, a police source said…

Biggest blast in months leaves at least 50 dead in IraqreliefWeb Wednesday, June 18, 2008 5:05:00 AM CESTA car bomb blast in northern Baghdad, the largest in months, left more than 50 people dead and 80 wounded on Tuesday, a police source said...

Page 6: Multilingual Event Extraction and Semi-automatic acquisition of related resources

EMM Event Extraction from Online News

Event Description

• Date: 18 June 2008• Place: Baghdad, Iraq• Event type: terrorist attack• Number killed: 50• Number wounded: 80• Number kidnapped: 0• Perpetrators: not reported• Weapons: car bomb

Page 7: Multilingual Event Extraction and Semi-automatic acquisition of related resources

NEXUS

EMM Event Extraction ArchitectureNews

Entity Match Geo-Tagging Clustering

TextProcessing

NER, Parsing,Pattern Matching

InformationAggregation

Visualization Events

Page 8: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Partial Parsing

Example for a multilingual rule, which recognizes NP like: "a French volunteer and an Italian military"

coordination_rule :> ( person_group & [NAME:#name1, AMOUNT:"1" #amount1] (token & [SURFACE: ","]?

person_group & [NAME:#name2, AMOUNT:"1" #amount2])?(token & [SURFACE: ","]?

person_group & [NAME:#name3, AMOUNT:"1" #amount3])?conjunctionperson_group & [NAME:#name4, AMOUNT:"1" #amount4]):c

c: person_group & [NAME:#final, AMOUNT:#amount, NUMBER:"p“]& #final := ConcForSum(#name1,#name2,#name3,#name4)& #amount := ConcForSum(#amount1,#amount2,#amount3,#amount4).

Page 9: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Annotating Participating Entities

This is one of the most important tasks – to label the person groups and other phrases with event specific semantic roles, e.g. Perpetrator, Dead victim, Displaced people, Weapons used, etc.

Linear patterns – work well for English We use linear patterns also for Russian More elaborated event extraction grammars for Arabic,

Italian, French, Spanish and Portuguese

Page 10: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Event-specific Grammars

Rule: <person-group> [introduce-passive] Verb[baseform: rimanere]? Adv? Verb[sem: injured-obj, passive-voice] <person-group> : injured

Cinque persone sono state feriteCinque persone sono state gravemente feriteCinque persone sono rimaste ferite For details see [Zavarella et.al. Event Extraction for

Italian, Using a Cascade of Finite State Grammars, FSMNLP 2008]

Page 11: Multilingual Event Extraction and Semi-automatic acquisition of related resources
Page 12: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Multilingual Lexical Acquisition

Page 13: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Multilingual Lexical Acquisition

Automatic learning of language-specific lexical resources

Statistical approaches, weakly supervised, make use of large quantities of unannotated news

Learning of patterns, keywords and keyphrases, which can be manually validated, rather than statistical models like SVM

Pattern learning Learning domain-specific lexica Learning semantic classes

Page 14: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Linear Pattern Learning

For English we use the linear patterns, as the algorithm learns them

We learned more 3000 linear patterns for English For Italian and other languages, linear patterns

are staring point for grammar development

Page 15: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Learning Semantic Classes

Sometimes, it is necessary to learn specific semantic classes, e.g. vehicles, disasters, weapons, facilities

We built a stastical system for automatic acquisition of semantic classes

The system is language-independent, only a list of language-specific stop words is used

Page 16: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Ontopopulis

INPUT:

feelings: hatred, love, fear, sadness

contrasting classes: taste, (style, outlook), character, thoughts

Page 17: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Extracting New Terms

Newly learnt terms are ordered and next given to the user for evaluation Top 20 terms from the category feelings

griefsorrowsadnesscondolencesfeardisappointmentregretsympathyshockhatredgratitudefrustrationangerdeep sorrowprofounddismaycondolencesatisfactionprofound griefdeep grief

Page 18: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Using Learnt Semantic Classes for Event Extraction

We use Ontopopulis to learn terms, which we next put into our domain-specific dictionaries

Some rules which require a domain specific dictionary: Rules for parsing person reference noun phrases, such as

two engineers Rules which detect weapons used:

killed with a [WEAPON] (killed with a gun ) Detection of vehicles used:

[PEOPLE] in a [VEHICLE] were stopped (three men in a boat were stopped)

Page 19: Multilingual Event Extraction and Semi-automatic acquisition of related resources

NEXUS Evaluation for English

61%Geo-tagging (place name)

90%Geo-tagging (country)

80%Event classification

57%Injured counting

70%Dead counting

AccuracyDetection Task

Page 20: Multilingual Event Extraction and Semi-automatic acquisition of related resources

NEXUS Multilingual Evaluation

0.470.670.510.69Portuguese

0.67-0.620.87Italian

ArrestedKidnappedWoundedDeadF1 measure

Page 21: Multilingual Event Extraction and Semi-automatic acquisition of related resources

Evaluation of Ontopopulis

------6095Spanish

7585207085756090Portuguese

BuildingCrimeEdged weapon

WatercraftVehiclePoliticianWeaponPersonAccuracy (%) top 20

Page 22: Multilingual Event Extraction and Semi-automatic acquisition of related resources