roseann presentation

ROSeAnn

Aggregating Semantic Annotators

Luying Chen, Stefano Ortona, Giorgio Orsi, and Michael Benedikt<[email protected]>

!Department of Computer Science

The University of Oxford!

DIADEM data extraction methodologydomain-centric intelligent automated

Plenty of data on the web

But the web is also textNews Feeds Posts, Tweets

- 41 -

155. Specific events and factors were of particular importance in the decline of ABCPs. Firstly, some conduits had large ABS holdings that experienced huge declines. When investors stopped rolling over ABCPs, these conduits had to rely on guarantees provided by banks which were too large for the banks providing them. While these banks received support to meet their obligations, investor confidence was nonetheless damaged. Secondly, structures in other ABCP markets around the world unsettled investors, including different guarantee agreements and single-seller extendible mortgage conduits. Thirdly, general concerns about the banking sector have caused investors to buy less bank related product.

Table 3 - European ABCP issuance

Q1 Q2 Q3 Q4 Total

2004 34.7 36.2 44.5 51.3 166.7

2005 58.1 63.4 61.6 55.2 238.4

2006 74.7 84.1 96.5 111.8 367.1

2007 148.8 142.3 156.7 186.1 633.9

2008 120.9 106 226.8

Source: Moody’s, Dealogic, ESF

Chart 5

Source: Société Générale Corporate & Investment Banking (market overview, 19 September, 2008)

Credit Derivatives Markets

156. The credit derivatives markets comprise a number of instruments. Credit default swaps represent, by far, the single most significant credit derivative instrument in terms of volume. Other credit derivative instruments are not covered in this consultation paper43.

43 Examples of credit derivatives not included in the scope of this consultation paper are total return swaps and credit linked notes.

PDFs

Entity recognition ecosystem

Understanding their behaviour

Collect all original entity types CompanyCountry

Movie…

PersonOrganization

Location…

CityCompany

StateOrCounty…

Organise them into an taxonomyCompany

Organization

StateOrCounty City

Location

Thing

Country

Movie

Person

Organization disjointWith Person

Organization disjointWith Location

Movie disjointWith Person

Person disjointWith Location

Add disjointness constraints

00.20.40.60.81

Person Date Movie

Precision Recall F-score

00.20.40.60.81

Location Sport Movie

Observation 1low accuracy

Entity extractors: observations

(*) Results obtained on Reuters http://about.reuters.com/researchandstandards/corpus/

http://about.reuters.com/researchandstandards/corpus/

Observation 2!Vocabulary is

limited and overlapping

Region'

Saplo'

Extrac1v'

AlchemyAPI'

Lupedia'

Zemanta'

Person' Country'

Scien1st'

Planet'

Museum'

Brand'Product'

Planet'

Ocean'

Company'

Entity extractors: observations

Analysis of conflicts

Conflicts are frequent -> reconciliation

Observation 3!They disagree on

concepts and spans

ROSeAnnReconcileOpinions ofSemanticAnnotators

Goals:Compute logically consistent annotations,Maximize the agreement among annotators.

Supervised: MEMMTrain a MEMM sequence labeller

Features (token-based):-entity type-subclass / disjointness-span (B/I/O encoding)

Inference:most likely and logically-consistent labelling for the sequence (Viterbi + pruning)

Unsupervised: Weighted Repair

Judgement aggregation:- experts give opinions about a set of (logical) statements- compute a logically-consistent, aggregated judgement

Database repairs / consistent query answering:- database instance + contraints (schema, dependencies)- answers computed on (minimal) repairs


Propositions:- ontological constraints Σ- annotations (as facts)

Base support:

A annotates a span with C∈Σ

AtomicScore(C) =

+1, ∀Ai annotating S with C’⊑ C

-1, ∀Ai annotating S with C’⊓ C ⊑ ⊥

or failing to annotate S with C’!

(in Ai vocabulary) with C ⊑ C’{


Initial solution: conjunction of all types:

φ: C1 ∧ C2 ∧ … ∧ Cn

Repair operations (op):

- ins(Ci): insertion of a Ci not already in φ

- del(Ci): deletion of a Ci from φ (and all its subclasses) and ins(¬Ci)

Solution (S): - non conflicting: op in S do not “override” each other - non redundant: insertion/deletion of not implied types- consistent with Σ - maximally agreed:max( 𝚺ins(C)∈S AtomicScore(C) - 𝚺del(C)∈S AtomicScore(C) )

Weighted Repair: Example

Person ⊓ Organisation ⊑ ⊥!

Chef ⊑ Person!

Person ⊑ LegalEntity!

Organisation ⊑ LegalEntity

Σ:

AtomicScore(Person) = +2 {A1,A3} -1 {A2} = +1

AtomicScore(Organisation) = +1 {A2} -2 {A1,A3} = -1

AtomicScore(Chef) = +1 {A3} -1 {A2} = 0

AtomicScore(LegalEntity) = +3 {A1,A2,A3} = +3

φ: Person ∧ Organisation ∧ Chef

text text text, Jamie Oliver and some text here

A2A1 A3Person Organisation Chef

Weighted Repair: Example

φ1: LegalEntity ∧

¬Person ∧

¬Organisation ∧

¬Chef

Agr(S1) = +3-1+1-0 = +3

ins(LegalEntity)

S1

del(Chef)

del(Person)

del(Organisation)

ins(LegalEntity)

S2

del(Organisation)


Person ∧

¬Organisation ∧

Chef

ins(LegalEntity)

S2

del(Organisation)


Person ∧

¬Organisation ∧

ChefAgr(S1) = +3+1+1+0 = +5

φ1: Chefφ1: Chef


Weighted Repair: Breaking ties

ins(LegalEntity)

S2

del(Organisation)


Person ∧

¬Organisation ∧

Chef

φ1: Chef φ1: Person


Person ∧

¬Organisation ∧

¬Chef

ins(LegalEntity)S3

del(Organisation)del(Chef)

same agreement fewer operations


Entity extractors: evaluation

Corpora:- MUC7 NER task [300 docs, 7 types, ~18k entities]- Reuters (sample) [250 docs, 215 types, ~50k entities]- FOX (Leipzig) [100 docs, 3 types, 395 entities]- Web [20 docs, 5 types, 624 entities]

Evaluation:

PrecisionΩ=|InstAN(C+) ∩ InstGS(C+)|

|InstAN(C+)|

RecallΩ=|InstAN(C+) ∩ InstGS(C+)|

|InstGS(C+)|

micro and macroaverages

10-fold validation

Evaluation: Individual vs Aggregated

(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann

http://diadem.cs.ox.ac.uk/roseann

Evaluation: Aggregators

(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann


Evaluation: Performance

WR:∝ number of annotations insisting on a span

MEMM: ∝ number of concepts in the ontology

WR MEMM

Performance: MEMM Training

MEMM: ∝ number of entity types in the ontology

ROSeAnn @work

Text

ROSeAnn @work

Web

ROSeAnn @work

PDF

Summary

Not discussed- Resolution of conflicting spans- Relationships with consistent QA / argumentation frameworks- WR with weights / bootstrapping- Web and PDF structural NERs (SNER)- MEMM vs CRF!

Future- Automatic maintenance of the ontology- Probabilistic and ontological querying of annotations- Relation, attribute, sentiment extraction- Entity disambiguation and linking

Get ROSeAnn at: http://diadem.cs.ox.ac.uk/roseann

Try out our REST endpoints:http://163.1.88.61:9091/roseann/texthttp://163.1.88.61:9091/roseann/web


http://163.1.88.61:9091/roseann/text

http://163.1.88.61:9091/roseann/web

roseann presentation

Technology