technical overview and challenges
DESCRIPTION
Technical Overview and Challenges. Achim Klein, University of Hohenheim 1 st Review Meeting, Luxembourg, 30 November 2011. Major Expected Outcomes. Financial market information system , providing n ew insights improved decision making with respect to three challenging use cases - PowerPoint PPT PresentationTRANSCRIPT
Achim Klein, University of Hohenheim
1st Review Meeting, Luxembourg, 30 November 2011
Technical Overview and Challenges
2
Major Expected Outcomes
Financial market information system, providingnew insights improved decision makingwith respect to three challenging use cases
Real-time and scalable pipeline forfinancial unstructured data acquisition, information extraction, sentiment analysis, information integration, visualization and decision-support models
3
Main Innovations and Challenges
Innovations1. Structured unstructured data
Noise and uncertainty
2. Offline processing real time streamOntology evolution, extraction, analysisOnline decision-support models, visualization
3. Small vast amounts of data
4. Financial decision-support models based on high level features
ChallengesAccuracy
Time efficiency
Throughput
Usefulness
4
Architecture, Integration & Scaling Strategy
Compass slideM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
5
Architecture, Integration & Scaling Strategy
Data AcquisitionM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Large-scale acquisition of
unstructured data • Uniform access to streams• Initial noise handling
Main challenges• Web data clean-up, and duplicate
detection• Scalability
6
Architecture, Integration & Scaling Strategy
Ontology InfrastructureM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Provide financial domain
ontology for information extraction tasks
Main challenges• (Semi-) automatic construction
and evolution of ontology and word lists
7
Architecture, Integration & Scaling Strategy
Information ExtractionM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Natural language pre-
processing• Extraction of named entities• Topic classification
Main challenges• Training data for topics
8
Architecture, Integration & Scaling Strategy
Sentiment AnalysisM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Extract sentiments with respect to use case
specific sentiment objects’ features.
Main challenges• Accuracy• Time efficiency
9
Architecture, Integration & Scaling Strategy
Decision Support InfrastructureM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Provide event detection and
prediction• Machine learning and qualitative
models based on high level features
• Advanced real-time visualization
Main challenges• Usefulness for decision makers• Time efficiency
10
Architecture, Integration & Scaling Strategy
Information IntegrationM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Storage of acquired and extracted
data• Integration of existing structured
data• Uniform access
Main challenges• Heterogeneity of information and
storages• Throughput
11
Architecture, Integration & Scaling Strategy
Architecture and IntegrationM
anag
emen
tW
P10
WP2 & WP7
Dis
sem
inat
ion
& E
xplo
itatio
nW
P9
WP3 WP4 WP6
DataAcquisition
OntologyInfrastructure
InformationExtraction
Sentiment Analysis
Decision SupportInfrastructure
Domain independent GUI(Open Source)
Information Integration
Data, Information & Knowledge Base
WP5
WP1 & WP8
UC#1Market
Surveillance
UC#2 Reputational
Risk management
UC#3 Online Retail
Brokerage
Main objectives• Scalable architecture• Integration of pipeline components • Integrated financial market information
system
Main challenges• Real time streams• Massive data volume
12
Implementation Roadmap
Month 1 Month 6(Milestone 1)
Month 12(Milestone 2)
Month 18(Milestone 3)
Month 24(Milestone 4)
Month 33(Milestone 5)
Month 36(Milestone 6)
Phase 1Requirements
analysis
Phase 2Design
Phase 3Development, integration, evaluation
Phase 4Finalization
Prototypes Large-scale Live feedsPrototypes Large-scale Live feeds
Scaling strategy implementation
• Requirements report
• State of the art analysis
• Set up infrastructure
• Start data acquisition
• Architecture • Scaling plan
• Corpus
• Preliminary prototypes
• Integrated Financial Market Information System
• Improved prototypes
• Real time streaming
• Decision-support models andvisualization
• End-user prototypes
• Final demonstration
• Evaluation reports
• Scale data volume
• Function complete
Live-FeedsPrototypes Large-scale
13
Summary of Y1 Tech. Achievements
InfrastructureCollecting documents since April 2011
(~8 mio. documents, 200 GB/month)Corpus of sentence-level annotated
documents (~900 and growing)Financial ontology (~4000 instances)First knowledge base
PrototypesData acquisition Sentiment extraction
Technology Evaluation and ExperimentsIntegration (ZeroMQ)Scaling (storage, messaging)Portfolio selection experiment
1ST YEAR
ACHIEVEMENTS
15
Summary of Y1 Tech. Achievements
Multi-core project server up and running
Collecting millions of documents since April 2011
Data storage experiments and first knowledge base
First version of financial ontology available
Sentiment extraction for all use cases
Initial decision making experiment (portfolio selection)
Scaling experiments (storage, messaging)
Initial (more advanced) integration tests
++ (not foreseen) annotated document corpus for sentiment analysis (gold standard)
1ST YEAR
ACHIEVEMENTS
16
Main Innovations and Challenges
1. Structured Unstructured Address noise and uncertainty
2. Offline Online streams (near real time)
Ontology infrastructure Machine learning Sentiment extraction Qualitative modeling Visualization
3. Small Vast amounts of data
4. Financial decision support Based on high-level semantic
features Glass-box models