data is dead without “what-if” · pdf filedata is dead without...
TRANSCRIPT
© 2011 IBM Corporation
IBM Research
DATA IS DEAD … WITHOUT “WHAT-IF” MODELS
Peter J. Haas, Paul P. Maglio, Patricia G. Selinger, and Wang-Chiew TanIBM Almaden Research Center
© 2011 IBM Corporation
IBM Research
Congratulations, Database Community!
© 2011 IBM Corporation
OLAPTransactions &Reports, IMS
Semi-structured &Unstructured text
Data mining
Web dataRelational model &
SQL
Text analytics
Semantic data
Massive Data/Cloud DB
Statisticalanalysis Machine learning
Streaming data
Uncertain data
© 2011 IBM Corporation
IBM Research
Congratulations, Database Community!
© 2011 IBM Corporation
OLAPTransactions &Reports, IMS
Semi-structured &Unstructured text
Data mining
Web dataRelational model &
SQL
Text analytics
Semantic data
Massive Data/Cloud DB
Statisticalanalysis Machine learning
Streaming data
Uncertain data
BUT: Why do enterprises care about data in the first place?
© 2011 IBM Corporation
IBM Research
Because enterprises need to make DECISIONS
Allocation of scarce resources
© 2011 IBM Corporation
IBM Research
Because enterprises need to make DECISIONS
“Analytics is…a complete business problem solvingand decision making process”
Allocation of scarce resources
© 2011 IBM Corporation
IBM Research
Because enterprises need to make DECISIONS
“Analytics is…a complete business problem solvingand decision making process”
Descriptive Analytics: Finding patterns andrelationships in historical and existing data
Allocation of scarce resources
© 2011 IBM Corporation
IBM Research
Because enterprises need to make DECISIONS
“Analytics is…a complete business problem solvingand decision making process”
Descriptive Analytics: Finding patterns andrelationships in historical and existing data
Predictive analytics: predict future probabilitiesand trends to allow what-if analysis
Allocation of scarce resources
© 2011 IBM Corporation
IBM Research
Because enterprises need to make DECISIONS
“Analytics is…a complete business problem solvingand decision making process”
Descriptive Analytics: Finding patterns andrelationships in historical and existing data
Predictive analytics: predict future probabilitiesand trends to allow what-if analysis
Prescriptive analytics: deterministic and stochastic optimizationto support better decision making
Allocation of scarce resources
© 2011 IBM Corporation
IBM Research
Shallow versus deep predictive analytics
United States House Prices
$0
$25,000
$50,000
$75,000
$100,000
$125,000
$150,000
$175,000
$200,000
$225,000
$250,000
$275,000
1970
1972
1974
1976
1978
1980
1982
1984
1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Year
Pric
e
Extrapolation of 1970-2006median U.S. housing prices
NCAR Community Atmosphere Model (CAM)
Actualprices
Extrapolation
© 2011 IBM Corporation
IBM Research
Is the DB community truly helping decision makers?
Some realizations…
© 2011 IBM Corporation
IBM Research
Data is dead
Name Item Price Date
Pat Red shoes $50 1/23/11
=
…a record of history that says nothingabout future or hypothetical worlds
© 2011 IBM Corporation
IBM Research
Descriptive analytics& shallow predictive analyticsare last resorts for decision making
(When you can’t find the domain experts)
…but are the main focus of mostdatabase and IM technology
© 2011 IBM Corporation
IBM Research
We can understand much moreby moving to deep predictive analytics
based on models and data
© 2011 IBM Corporation
IBM Research
We can understand much moreby moving to deep predictive analytics
based on models and data
Data-centrism is WRONG:Exploit expert knowledge of fundamental structure, causal relationships,
and dynamics of system constituents to create first-principlessimulation models
© 2011 IBM Corporation
IBM Research
Especially true for complex systems-of-systems
Huang, T. T, Drewnowski, A., Kumanyika, S. K., & Glass, T. A., 2009, “A Systems-Oriented Multilevel Framework for Addressing Obesity in the 21st Century, ” Preventing Chronic Disease, 6(3).
Challenge: Facilitating integration of existing simulation models, statistical models, optimization models, and datasets for what-if analysis
© 2011 IBM Corporation
IBM Research
Especially true for complex systems-of-systems
Huang, T. T, Drewnowski, A., Kumanyika, S. K., & Glass, T. A., 2009, “A Systems-Oriented Multilevel Framework for Addressing Obesity in the 21st Century, ” Preventing Chronic Disease, 6(3).
Challenge: Facilitating integration of existing simulation models, statistical models, optimization models, and datasets for what-if analysis
Making such integration feasible, practical, flexible,attractive, cost-effective, and usable
© 2011 IBM Corporation
IBM Research
An example of a models-and-data approach: Splash
© 2011 IBM Corporation
SPLASH REPOSITORYSPLASH REPOSITORY
Model and Data RegistrationModel and Data Registration
Model, Data, and Mapping Discovery
Model, Data, and Mapping Discovery
Model and Data CompositionModel and Data Composition
Model ExecutionModel Execution
Experiment ManagerExperiment Manager
SPLASH MODULES
Collaborative Reporting and Visualization
Collaborative Reporting and Visualization
Provide models and data
Use models and data
Multi-disciplinary users
DataData
Composite Model
Composite Model
DataData
ModelModel
ModelModel
MappingMapping
contains
describes
Loose model coupling through data exchangeMetadata - Model inputs and outputs - Access and execution - Data schemas - Model and data locations - Model and data semantics
© 2011 IBM Corporation
IBM Research
Splash composite obesity model (proof of concept)
Buying and eating(Agent-based simulation model)
Transportation(VISUM simulation model)
Demographic data
GIS data
BMI Model(Differential equation model)
Time alignmentand data merging
Geospatial alignment
Facility data
Exercise(Stochastic discrete-event simulation)
Flow of data
Data source
Simulation model
MappingResults
Clio++JAQL + Hadoop
© 2011 IBM Corporation
IBM Research
Database research++
Data search model-and-data search– Find compatible models, data, and mappings (using metadata)– Involves semantic search technologies, repository management, privacy and security
Data integration model integration– Simulation-oriented data mapping– Time, space, unit alignment [e.g., Howe & Maier 2005]– Hierarchical models with different resolutions– Complex data transformations (e.g., raw simulation output to histogram)
Query optimization simulation-experiment optimization– Optimally configure workflow among distributed data and models– Factoring common operations across different mappings in the workflow– Avoiding redundant computations across experiments– Statistical issues: managing pseudorandom numbers and Monte Carlo replications
Model A Model B
© 2011 IBM Corporation
IBM Research
Database research++ (continued)
Causality approximation– Fixed-point + perturbation approaches– System support– Theoretical support
Deep collaborative analytics– Visualizing and mining the results– Understanding and explaining results:
• Dashboarding of parameters• Provenance [e.g., J. Friere et al.]• Root-cause analysis• Sensitivity analysis
– Trusting results• Model validation• ManyEyes++, Swivel++
1 1
2 1
( ) ( ), ( )
( ) ( ), ( )n n n
n n n
f t f t g t
g t f t g t
Transportation Model
Buying & Eating Model
1
2
( ) ( ), ( ) for ,( 1)
( ) ( ), ( )
f t f t g n tt n t n t
g t f n t g t
Other models-and-data research : MCDB [Jermaine et al.], BRASIL [Gehrke et al.]
© 2011 IBM Corporation
IBM Research
Conclusion
DB community has focused on descriptive analytics, but enterprises need deep predictive analytics for what-if analysis, based on expert understanding of underlying mechanisms
Models and data need to be brought together on an equal footing
Requires significant extensions of database technology (exciting research opportunities!)
Opportunity to redefine ourselves as the model-and-data community
In short:
Data
Models
Models
Data
generateshypotheses for
guidescollection of
parameterizes
generates
© 2011 IBM Corporation
IBM Research
DATA IS DEAD … WITHOUT “WHAT-IF” MODELS
Peter J. Haas, Paul P. Maglio, Patricia G. Selinger, and Wang-Chiew TanIBM Almaden Research Center
www.almaden.ibm.com/asr/projects/splash
Thanks to the Splash team: Melissa Cefkin, Susanne M. Glissmann, Cheryl A. Kieliszewski, Yinan Li, and Ronald Mak