roseann presentation

Post on 02-Jul-2015

126 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Abstract: A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often have very different vocabularies, with both high-level and specialist concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement. The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.

TRANSCRIPT

ROSeAnn

Aggregating Semantic Annotators

Luying Chen, Stefano Ortona, Giorgio Orsi, and Michael Benedikt<name.surname@cs.ox.ac.uk>

!Department of Computer Science

The University of Oxford!

DIADEM data extraction methodologydomain-centric intelligent automated

Plenty of data on the web

Plenty of data on the web

Plenty of data on the web

Plenty of data on the web

But the web is also textNews Feeds Posts, Tweets

- 41 -

155. Specific events and factors were of particular importance in the decline of ABCPs. Firstly, some conduits had large ABS holdings that experienced huge declines. When investors stopped rolling over ABCPs, these conduits had to rely on guarantees provided by banks which were too large for the banks providing them. While these banks received support to meet their obligations, investor confidence was nonetheless damaged. Secondly, structures in other ABCP markets around the world unsettled investors, including different guarantee agreements and single-seller extendible mortgage conduits. Thirdly, general concerns about the banking sector have caused investors to buy less bank related product.

Table 3 - European ABCP issuance

Q1 Q2 Q3 Q4 Total

2004 34.7 36.2 44.5 51.3 166.7

2005 58.1 63.4 61.6 55.2 238.4

2006 74.7 84.1 96.5 111.8 367.1

2007 148.8 142.3 156.7 186.1 633.9

2008 120.9 106 226.8

Source: Moody’s, Dealogic, ESF

Chart 5

Source: Société Générale Corporate & Investment Banking (market overview, 19 September, 2008)

Credit Derivatives Markets

156. The credit derivatives markets comprise a number of instruments. Credit default swaps represent, by far, the single most significant credit derivative instrument in terms of volume. Other credit derivative instruments are not covered in this consultation paper43.

43 Examples of credit derivatives not included in the scope of this consultation paper are total return swaps and credit linked notes.

PDFs

Entity recognition ecosystem

Understanding their behaviour

Collect all original entity types CompanyCountry

Movie…

PersonOrganization

Location…

CityCompany

StateOrCounty…

Organise them into an taxonomyCompany

Organization

StateOrCounty City

Location

Thing

Country

Movie

Person

Organization disjointWith Person

Organization disjointWith Location

Movie disjointWith Person

Person disjointWith Location

Add disjointness constraints

00.20.40.60.81

Person Date Movie

Precision Recall F-score

00.20.40.60.81

Location Sport Movie

Observation 1low accuracy

Entity extractors: observations

(*) Results obtained on Reuters http://about.reuters.com/researchandstandards/corpus/

Observation 2!Vocabulary is

limited and overlapping

Region'

Saplo'

Extrac1v'

AlchemyAPI'

Lupedia'

Zemanta'

Person' Country'

Scien1st'

Planet'

Museum'

Brand'Product'

Planet'

Ocean'

Company'

Entity extractors: observations

Analysis of conflicts

Conflicts are frequent -> reconciliation

Observation 3!They disagree on

concepts and spans

ROSeAnnReconcileOpinions ofSemanticAnnotators

Goals:Compute logically consistent annotations,Maximize the agreement among annotators.

Supervised: MEMMTrain a MEMM sequence labeller

Features (token-based):-entity type-subclass / disjointness-span (B/I/O encoding)

Inference:most likely and logically-consistent labelling for the sequence (Viterbi + pruning)

Unsupervised: Weighted Repair

Judgement aggregation:- experts give opinions about a set of (logical) statements- compute a logically-consistent, aggregated judgement

Database repairs / consistent query answering:- database instance + contraints (schema, dependencies)- answers computed on (minimal) repairs

Unsupervised: Weighted Repair

Propositions:- ontological constraints Σ- annotations (as facts)

Base support:

A annotates a span with C∈Σ

AtomicScore(C) =

+1, ∀Ai annotating S with C’⊑ C

-1, ∀Ai annotating S with C’⊓ C ⊑ ⊥

or failing to annotate S with C’!

(in Ai vocabulary) with C ⊑ C’{

Unsupervised: Weighted Repair

Initial solution: conjunction of all types:

φ: C1 ∧ C2 ∧ … ∧ Cn

Repair operations (op):

- ins(Ci): insertion of a Ci not already in φ

- del(Ci): deletion of a Ci from φ (and all its subclasses) and ins(¬Ci)

Solution (S): - non conflicting: op in S do not “override” each other - non redundant: insertion/deletion of not implied types- consistent with Σ - maximally agreed:max( 𝚺ins(C)∈S AtomicScore(C) - 𝚺del(C)∈S AtomicScore(C) )

Weighted Repair: Example

Person ⊓ Organisation ⊑ ⊥!

Chef ⊑ Person!

Person ⊑ LegalEntity!

Organisation ⊑ LegalEntity

Σ:

AtomicScore(Person) = +2 {A1,A3} -1 {A2} = +1

AtomicScore(Organisation) = +1 {A2} -2 {A1,A3} = -1

AtomicScore(Chef) = +1 {A3} -1 {A2} = 0

AtomicScore(LegalEntity) = +3 {A1,A2,A3} = +3

φ: Person ∧ Organisation ∧ Chef

text text text, Jamie Oliver and some text here

A2A1 A3Person Organisation Chef

Weighted Repair: Example

φ1: LegalEntity ∧

¬Person ∧

¬Organisation ∧

¬Chef

Agr(S1) = +3-1+1-0 = +3

ins(LegalEntity)

S1

del(Chef)

del(Person)

del(Organisation)

ins(LegalEntity)

S2

del(Organisation)

φ1: LegalEntity ∧

Person ∧

¬Organisation ∧

Chef

ins(LegalEntity)

S2

del(Organisation)

φ1: LegalEntity ∧

Person ∧

¬Organisation ∧

ChefAgr(S1) = +3+1+1+0 = +5

φ1: Chefφ1: Chef

φ: Person ∧ Organisation ∧ Chef

Weighted Repair: Breaking ties

ins(LegalEntity)

S2

del(Organisation)

φ1: LegalEntity ∧

Person ∧

¬Organisation ∧

Chef

φ1: Chef φ1: Person

φ1: LegalEntity ∧

Person ∧

¬Organisation ∧

¬Chef

ins(LegalEntity)S3

del(Organisation)del(Chef)

same agreement fewer operations

φ: Person ∧ Organisation ∧ Chef

Entity extractors: evaluation

Corpora:- MUC7 NER task [300 docs, 7 types, ~18k entities]- Reuters (sample) [250 docs, 215 types, ~50k entities]- FOX (Leipzig) [100 docs, 3 types, 395 entities]- Web [20 docs, 5 types, 624 entities]

Evaluation:

PrecisionΩ=|InstAN(C+) ∩ InstGS(C+)|

|InstAN(C+)|

RecallΩ=|InstAN(C+) ∩ InstGS(C+)|

|InstGS(C+)|

micro and macroaverages

10-fold validation

Evaluation: Individual vs Aggregated

(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann

Evaluation: Aggregators

(*) Full comparison results at http://diadem.cs.ox.ac.uk/roseann

Evaluation: Performance

WR:∝ number of annotations insisting on a span

MEMM: ∝ number of concepts in the ontology

WR MEMM

Performance: MEMM Training

MEMM: ∝ number of entity types in the ontology

ROSeAnn @work

Text

ROSeAnn @work

Web

ROSeAnn @work

PDF

Summary

Not discussed- Resolution of conflicting spans- Relationships with consistent QA / argumentation frameworks- WR with weights / bootstrapping- Web and PDF structural NERs (SNER)- MEMM vs CRF!

Future- Automatic maintenance of the ontology- Probabilistic and ontological querying of annotations- Relation, attribute, sentiment extraction- Entity disambiguation and linking

Get ROSeAnn at: http://diadem.cs.ox.ac.uk/roseann

Try out our REST endpoints:http://163.1.88.61:9091/roseann/texthttp://163.1.88.61:9091/roseann/web

top related