data is dead without “what-if” · pdf filedata is dead without...

22
© 2011 IBM Corporation IBM Research DATA IS DEAD … WITHOUT “WHAT-IF” MODELS Peter J. Haas, Paul P. Maglio, Patricia G. Selinger, and Wang-Chiew Tan IBM Almaden Research Center

Upload: dinhbao

Post on 05-Feb-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

DATA IS DEAD … WITHOUT “WHAT-IF” MODELS

Peter J. Haas, Paul P. Maglio, Patricia G. Selinger, and Wang-Chiew TanIBM Almaden Research Center

Page 2: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Congratulations, Database Community!

© 2011 IBM Corporation

OLAPTransactions &Reports, IMS

Semi-structured &Unstructured text

Data mining

Web dataRelational model &

SQL

Text analytics

Semantic data

Massive Data/Cloud DB

Statisticalanalysis Machine learning

Streaming data

Uncertain data

Page 3: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Congratulations, Database Community!

© 2011 IBM Corporation

OLAPTransactions &Reports, IMS

Semi-structured &Unstructured text

Data mining

Web dataRelational model &

SQL

Text analytics

Semantic data

Massive Data/Cloud DB

Statisticalanalysis Machine learning

Streaming data

Uncertain data

BUT: Why do enterprises care about data in the first place?

Page 4: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Because enterprises need to make DECISIONS

Allocation of scarce resources

Page 5: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Because enterprises need to make DECISIONS

“Analytics is…a complete business problem solvingand decision making process”

Allocation of scarce resources

Page 6: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Because enterprises need to make DECISIONS

“Analytics is…a complete business problem solvingand decision making process”

Descriptive Analytics: Finding patterns andrelationships in historical and existing data

Allocation of scarce resources

Page 7: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Because enterprises need to make DECISIONS

“Analytics is…a complete business problem solvingand decision making process”

Descriptive Analytics: Finding patterns andrelationships in historical and existing data

Predictive analytics: predict future probabilitiesand trends to allow what-if analysis

Allocation of scarce resources

Page 8: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Because enterprises need to make DECISIONS

“Analytics is…a complete business problem solvingand decision making process”

Descriptive Analytics: Finding patterns andrelationships in historical and existing data

Predictive analytics: predict future probabilitiesand trends to allow what-if analysis

Prescriptive analytics: deterministic and stochastic optimizationto support better decision making

Allocation of scarce resources

Page 9: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Shallow versus deep predictive analytics

United States House Prices

$0

$25,000

$50,000

$75,000

$100,000

$125,000

$150,000

$175,000

$200,000

$225,000

$250,000

$275,000

1970

1972

1974

1976

1978

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

2010

Year

Pric

e

Extrapolation of 1970-2006median U.S. housing prices

NCAR Community Atmosphere Model (CAM)

Actualprices

Extrapolation

Page 10: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Is the DB community truly helping decision makers?

Some realizations…

Page 11: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Data is dead

Name Item Price Date

Pat Red shoes $50 1/23/11

=

…a record of history that says nothingabout future or hypothetical worlds

Page 12: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Descriptive analytics& shallow predictive analyticsare last resorts for decision making

(When you can’t find the domain experts)

…but are the main focus of mostdatabase and IM technology

Page 13: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

We can understand much moreby moving to deep predictive analytics

based on models and data

Page 14: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

We can understand much moreby moving to deep predictive analytics

based on models and data

Data-centrism is WRONG:Exploit expert knowledge of fundamental structure, causal relationships,

and dynamics of system constituents to create first-principlessimulation models

Page 15: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Especially true for complex systems-of-systems

Huang, T. T, Drewnowski, A., Kumanyika, S. K., & Glass, T. A., 2009, “A Systems-Oriented Multilevel Framework for Addressing Obesity in the 21st Century, ” Preventing Chronic Disease, 6(3).

Challenge: Facilitating integration of existing simulation models, statistical models, optimization models, and datasets for what-if analysis

Page 16: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Especially true for complex systems-of-systems

Huang, T. T, Drewnowski, A., Kumanyika, S. K., & Glass, T. A., 2009, “A Systems-Oriented Multilevel Framework for Addressing Obesity in the 21st Century, ” Preventing Chronic Disease, 6(3).

Challenge: Facilitating integration of existing simulation models, statistical models, optimization models, and datasets for what-if analysis

Making such integration feasible, practical, flexible,attractive, cost-effective, and usable

Page 17: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

An example of a models-and-data approach: Splash

© 2011 IBM Corporation

SPLASH REPOSITORYSPLASH REPOSITORY

Model and Data RegistrationModel and Data Registration

Model, Data, and Mapping Discovery

Model, Data, and Mapping Discovery

Model and Data CompositionModel and Data Composition

Model ExecutionModel Execution

Experiment ManagerExperiment Manager

SPLASH MODULES

Collaborative Reporting and Visualization

Collaborative Reporting and Visualization

Provide models and data

Use models and data

Multi-disciplinary users

DataData

Composite Model

Composite Model

DataData

ModelModel

ModelModel

MappingMapping

contains

describes

Loose model coupling through data exchangeMetadata - Model inputs and outputs - Access and execution - Data schemas - Model and data locations - Model and data semantics

Page 18: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Splash composite obesity model (proof of concept)

Buying and eating(Agent-based simulation model)

Transportation(VISUM simulation model)

Demographic data

GIS data

BMI Model(Differential equation model)

Time alignmentand data merging

Geospatial alignment

Facility data

Exercise(Stochastic discrete-event simulation)

Flow of data

Data source

Simulation model

MappingResults

Clio++JAQL + Hadoop

Page 19: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Database research++

Data search model-and-data search– Find compatible models, data, and mappings (using metadata)– Involves semantic search technologies, repository management, privacy and security

Data integration model integration– Simulation-oriented data mapping– Time, space, unit alignment [e.g., Howe & Maier 2005]– Hierarchical models with different resolutions– Complex data transformations (e.g., raw simulation output to histogram)

Query optimization simulation-experiment optimization– Optimally configure workflow among distributed data and models– Factoring common operations across different mappings in the workflow– Avoiding redundant computations across experiments– Statistical issues: managing pseudorandom numbers and Monte Carlo replications

Model A Model B

Page 20: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Database research++ (continued)

Causality approximation– Fixed-point + perturbation approaches– System support– Theoretical support

Deep collaborative analytics– Visualizing and mining the results– Understanding and explaining results:

• Dashboarding of parameters• Provenance [e.g., J. Friere et al.]• Root-cause analysis• Sensitivity analysis

– Trusting results• Model validation• ManyEyes++, Swivel++

1 1

2 1

( ) ( ), ( )

( ) ( ), ( )n n n

n n n

f t f t g t

g t f t g t

Transportation Model

Buying & Eating Model

1

2

( ) ( ), ( ) for ,( 1)

( ) ( ), ( )

f t f t g n tt n t n t

g t f n t g t

Other models-and-data research : MCDB [Jermaine et al.], BRASIL [Gehrke et al.]

Page 21: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

Conclusion

DB community has focused on descriptive analytics, but enterprises need deep predictive analytics for what-if analysis, based on expert understanding of underlying mechanisms

Models and data need to be brought together on an equal footing

Requires significant extensions of database technology (exciting research opportunities!)

Opportunity to redefine ourselves as the model-and-data community

In short:

Data

Models

Models

Data

generateshypotheses for

guidescollection of

parameterizes

generates

Page 22: DATA IS DEAD  WITHOUT “WHAT-IF”  · PDF fileDATA IS DEAD  WITHOUT “WHAT-IF” MODELS Peter J. Haas, ... (proof of concept) ... – Involves semantic search technologies,

© 2011 IBM Corporation

IBM Research

DATA IS DEAD … WITHOUT “WHAT-IF” MODELS

Peter J. Haas, Paul P. Maglio, Patricia G. Selinger, and Wang-Chiew TanIBM Almaden Research Center

www.almaden.ibm.com/asr/projects/splash

Thanks to the Splash team: Melissa Cefkin, Susanne M. Glissmann, Cheryl A. Kieliszewski, Yinan Li, and Ronald Mak