technical overview and challenges

16
Achim Klein, University of Hohenheim 1 st Review Meeting, Luxembourg, 30 November 2011 Technical Overview and Challenges

Upload: zachary-atkins

Post on 02-Jan-2016

17 views

Category:

Documents


2 download

DESCRIPTION

Technical Overview and Challenges. Achim Klein, University of Hohenheim 1 st Review Meeting, Luxembourg, 30 November 2011. Major Expected Outcomes. Financial market information system , providing n ew insights improved decision making with respect to three challenging use cases - PowerPoint PPT Presentation

TRANSCRIPT

Achim Klein, University of Hohenheim

1st Review Meeting, Luxembourg, 30 November 2011

Technical Overview and Challenges

2

Major Expected Outcomes

Financial market information system, providingnew insights improved decision makingwith respect to three challenging use cases

Real-time and scalable pipeline forfinancial unstructured data acquisition, information extraction, sentiment analysis, information integration, visualization and decision-support models

3

Main Innovations and Challenges

Innovations1. Structured unstructured data

Noise and uncertainty

2. Offline processing real time streamOntology evolution, extraction, analysisOnline decision-support models, visualization

3. Small vast amounts of data

4. Financial decision-support models based on high level features

ChallengesAccuracy

Time efficiency

Throughput

Usefulness

4

Architecture, Integration & Scaling Strategy

Compass slideM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

5

Architecture, Integration & Scaling Strategy

Data AcquisitionM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Large-scale acquisition of

unstructured data • Uniform access to streams• Initial noise handling

Main challenges• Web data clean-up, and duplicate

detection• Scalability

6

Architecture, Integration & Scaling Strategy

Ontology InfrastructureM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Provide financial domain

ontology for information extraction tasks

Main challenges• (Semi-) automatic construction

and evolution of ontology and word lists

7

Architecture, Integration & Scaling Strategy

Information ExtractionM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Natural language pre-

processing• Extraction of named entities• Topic classification

Main challenges• Training data for topics

8

Architecture, Integration & Scaling Strategy

Sentiment AnalysisM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Extract sentiments with respect to use case

specific sentiment objects’ features.

Main challenges• Accuracy• Time efficiency

9

Architecture, Integration & Scaling Strategy

Decision Support InfrastructureM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Provide event detection and

prediction• Machine learning and qualitative

models based on high level features

• Advanced real-time visualization

Main challenges• Usefulness for decision makers• Time efficiency

10

Architecture, Integration & Scaling Strategy

Information IntegrationM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Storage of acquired and extracted

data• Integration of existing structured

data• Uniform access

Main challenges• Heterogeneity of information and

storages• Throughput

11

Architecture, Integration & Scaling Strategy

Architecture and IntegrationM

anag

emen

tW

P10

WP2 & WP7

Dis

sem

inat

ion

& E

xplo

itatio

nW

P9

WP3 WP4 WP6

DataAcquisition

OntologyInfrastructure

InformationExtraction

Sentiment Analysis

Decision SupportInfrastructure

Domain independent GUI(Open Source)

Information Integration

Data, Information & Knowledge Base

WP5

WP1 & WP8

UC#1Market

Surveillance

UC#2 Reputational

Risk management

UC#3 Online Retail

Brokerage

Main objectives• Scalable architecture• Integration of pipeline components • Integrated financial market information

system

Main challenges• Real time streams• Massive data volume

12

Implementation Roadmap

Month 1 Month 6(Milestone 1)

Month 12(Milestone 2)

Month 18(Milestone 3)

Month 24(Milestone 4)

Month 33(Milestone 5)

Month 36(Milestone 6)

Phase 1Requirements

analysis

Phase 2Design

Phase 3Development, integration, evaluation

Phase 4Finalization

Prototypes Large-scale Live feedsPrototypes Large-scale Live feeds

Scaling strategy implementation

• Requirements report

• State of the art analysis

• Set up infrastructure

• Start data acquisition

• Architecture • Scaling plan

• Corpus

• Preliminary prototypes

• Integrated Financial Market Information System

• Improved prototypes

• Real time streaming

• Decision-support models andvisualization

• End-user prototypes

• Final demonstration

• Evaluation reports

• Scale data volume

• Function complete

Live-FeedsPrototypes Large-scale

13

Summary of Y1 Tech. Achievements

InfrastructureCollecting documents since April 2011

(~8 mio. documents, 200 GB/month)Corpus of sentence-level annotated

documents (~900 and growing)Financial ontology (~4000 instances)First knowledge base

PrototypesData acquisition Sentiment extraction

Technology Evaluation and ExperimentsIntegration (ZeroMQ)Scaling (storage, messaging)Portfolio selection experiment

1ST YEAR

ACHIEVEMENTS

Thank you

14

15

Summary of Y1 Tech. Achievements

Multi-core project server up and running

Collecting millions of documents since April 2011

Data storage experiments and first knowledge base

First version of financial ontology available

Sentiment extraction for all use cases

Initial decision making experiment (portfolio selection)

Scaling experiments (storage, messaging)

Initial (more advanced) integration tests

++ (not foreseen) annotated document corpus for sentiment analysis (gold standard)

1ST YEAR

ACHIEVEMENTS

16

Main Innovations and Challenges

1. Structured Unstructured Address noise and uncertainty

2. Offline Online streams (near real time)

Ontology infrastructure Machine learning Sentiment extraction Qualitative modeling Visualization

3. Small Vast amounts of data

4. Financial decision support Based on high-level semantic

features Glass-box models