title of the presentation this is the...
Post on 20-Feb-2021
3 Views
Preview:
TRANSCRIPT
-
Text-To-Query
Dynamically building structured analytics to illustrate textual content
BEWEB 2010, Lausanne, Switzerland
2010 March 22nd
Raphaël Thollot
SAP BusinessObjects ARC
Ecole Centrale Paris
Falk Brauer
SAP Research CEC Dresden
Wojciech Barczynski
SAP Research CEC Dresden
Marie-Aude Aufaure
Ecole Centrale Paris
SAP BusinessObjects Chair
raphael.thollot@sap.com falk.brauer@sap.com wojciech.barczynski@sap.com marie-aude.aufaure@ecp.fr
-
© 2010 SAP AG. All rights reserved. / Page 2
Introduction – Motivation and objectives
Background
A majority of information lies in unstructured
Data management often requires a first
structuring phase
Most users are not database or BI experts
Navigation structured-unstructured is still
difficult
Prior work did not use metadata of a warehouse
to suggest aggregated analytics
Use cases
Augmented web browsing
Supported data acquisition
Goal: suggest relevant structured data with an appropriate
visualization, based on unstructured user’s input
-
© 2010 SAP AG. All rights reserved. / Page 3
Introduction – State-of-the-art
EROCS [4, 10]: Entity RecOgnition in Context of Structured data
Link an unstructured document to external structured data
Text-entity associations enable consolidated BI
Focus on the EROCS system
EROCS links a document to additional information on extracted
entities, but does not consider aggregated analytics
-
© 2010 SAP AG. All rights reserved. / Page 4
Our approach
Enable named entity recognition based on
structured data
Combine entities extracted in a text to build
relevant queries
Propose well adapted visualizations for
queries results
Challenges
Building and maintaining entity dictionaries is
costly
Assembling entities into meaningful queries
Explicit and implicit mentions
Influence the visualization to better illustrate
analysis intentions in the text
Text-To-Query
Visualization
Structured data
Text analysis
Solution overview
-
© 2010 SAP AG. All rights reserved. / Page 5
Solution overview – Workflow
The solution we propose takes two steps
Produce necessary metadata from an existing universe to enhance text
analysis
Extract the context of a piece of text to understand intention and expectations
Generate a query to produce appropriate charts with
relevant data
Pre-processing phase
SL
ThingFinderThingFinder
SL
CVOM
Runtime phase
SL Semantic Layer
ThingFinder Entity extraction technology
CVOM Visualization framework
-
© 2010 SAP AG. All rights reserved. / Page 6
Solution overview – Technical components
Data warehouses
OLAP cubes
Measures, dimensions
Hierarchies
Analysis operations
Drill-down
Filter
Semantic Layer
Meaningful naming of an underlying SQL structure
Enables queries from non-expert users
« Revenue » « Country » « 2010 »
Product category
Co
un
try
Aggregated measure:
Sales revenue
-
© 2010 SAP AG. All rights reserved. / Page 7
Solution overview – Pre-processing phase
Automatic generation of an entity dictionary
Category >> Entity >> Variant
Entities described in a warehouse
Measures
Dimensions and instances
An entity may appear in different forms
Stemming
Variants dictionary
Typing warehouse objects
Measures and dimension can belong to
standard analysis categories
E.g., dimension Country is in standard category
Geography
…
…
-
© 2010 SAP AG. All rights reserved. / Page 8
Solution overview – Architecture
Universe-
specific
dictionary
Web tierServer tier
Runtime web
service
Runtime
back-end
server
Outlook
add-inPre-
processor
Text analysis SDK
Standard analysis
categories dictionary
Functional
dependenciesTyping
metadata
BI SDK
Client tier
PowerPoint
add-in
Browser
extension
Business Intelligence
platform
Text analysis platform
-
© 2010 SAP AG. All rights reserved. / Page 9
Talking about sales Interested in an evolution analysis Interested in resorts
Runtime analysis – Illustration of categories in
the generated dictionary
A category is described by keywords referring to the concept
Standard subjects
- Sales
- Finance
- Etc.
Standard analysis dimensions
- Time
- Geography
- Etc.
Domain-specific dimension
Vocabulary (partly) defined in the
Semantic Layer (universe).
The pre-processing phase extends the standard dictionary with
custom entities defined in a data warehouse
Standard analysis categories (SAC) Business entities (BE)
« Our sales are growing in all resorts »
-
© 2010 SAP AG. All rights reserved. / Page 10
Runtime analysis – Capturing the Data
Analysis Context
Data Analysis Context (DAC)
Set of extracted entities (SAC + BE)
Sentence-by-sentence analysis
Maintain units of sense
Continuous DAC update: using successive sentences to propagate key concepts
The warehouse is not accessed when assembling queries
Repeated for each text unit
Segment text Stem text unitExtract entities (SAC and BE)
Augment with previous DAC
Group into queries Build adapted
charts
1 2 3
6 5 4
-
© 2010 SAP AG. All rights reserved. / Page 11
Runtime analysis – Building query suggestions
from a Data Analysis Context
Ensure all extracted SAC are represented
– E.g., “our revenue increases in some countries”
– Revenue + Country + Time SAC
Choose highest level object from the warehouse
Aggregate at the highest level in undetermined cases
– Time dimension: Year, Quarter, Month, Week, etc.
Group compatible measures & dimensions
– Revenue is not compatible with Reservation year
Filter on extracted instances of dimensions
– E.g., “French Riviera generated more revenue than
Bahamas Beach”
– Add the Resort dimension
Particular case
– Remove dimensions with a filter on a single instance
Influence the generated visualization / chart
Analysis types
– Trending
– Contribution
– Comparison
– Ranking
2. Functional dependencies1. SAC Representation
4. Analysis type3. Filters
-
© 2010 SAP AG. All rights reserved. / Page 12
Query Chart preview
Revenue per Year per
Country
Analysis: Trending
Revenue per Resort
Analysis: Contribution
Total Revenue
Filter on Resort = French
Riviera
Analysis: Undetermined
Preliminary results – Sample query
suggestions
Sales are growing everywhere.
The relative importance of each resort
to the revenue is satisfying.
• French Riviera is doing very good.
-
© 2010 SAP AG. All rights reserved. / Page 13
Preliminary results – Sample query
suggestions
-
© 2010 SAP AG. All rights reserved. / Page 14
Conclusion
Achievements
We leverage metadata from a warehouse
Build dictionary for entity recognition
We illustrate a text with corporate data
Dynamically generated and meaningful queries
Appropriate visualization
Text analysis is kept simple
Method is easy to apply with another language
Restricted business domain
We developed two prototype front-ends
Office and web environments
PowerPoint add-in
12Sprints method
-
© 2010 SAP AG. All rights reserved. / Page 15
Identified key issues and future work
Key issues
Coverage and extensibility
Automatically generate variants for custom entities
Increase the coverage of the standard dictionary
Suggestions evaluation
Estimate relevance
Personalize suggestions
Ongoing and future work
Evaluation method for suggestions relevance
Refine and extend Standard analysis categories
Handle query suggestions on multiple business contexts / warehouses
-
© 2010 SAP AG. All rights reserved. / Page 16
Thank you!
Raphaël Thollot
SAP BusinessObjects
Academic Research Center (ARC)
Levallois-Perret, France
T +33 1 41 25 30 40
raphael.thollot@sap.com
www.sap.com
http://www.sap.com/
top related