1 so what to do next? michael stonebraker adjunct professor massachusetts institute of technology...

40
1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology ([email protected])

Upload: shana-holland

Post on 14-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

1

So What To Do Next?

Michael Stonebraker

Adjunct Professor

Massachusetts Institute of Technology

([email protected])

Page 2: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

2

M.I.T.

Where To Find ProblemsWhere To Find ProblemsWhere To Find ProblemsWhere To Find Problems

State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges

State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges

Page 3: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

3

M.I.T.

State of AffairsState of AffairsState of AffairsState of Affairs

IT failure rateSoftware half-lifeNo knobs

IT failure rateSoftware half-lifeNo knobs

Page 4: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

4

M.I.T.

State of AffairsState of Affairs

~50-75% of IT projects fail if we built bridges, our profession

would be firedand the same mistakes are repeated

over and over (excessive ambition,

rolling specs, bad design, failure to

load a large data set early)

~50-75% of IT projects fail if we built bridges, our profession

would be firedand the same mistakes are repeated

over and over (excessive ambition,

rolling specs, bad design, failure to

load a large data set early)

Page 5: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

5

M.I.T.

What To Do?What To Do?

We typically don’t teach this stuffprobably because we don’t (can’t)

spend any time in industry to figure it

out

We typically don’t teach this stuffprobably because we don’t (can’t)

spend any time in industry to figure it

out

Action item: at the very least read a couple of Robert L. Glass’s books

Page 6: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

6

M.I.T.

State of AffairsState of Affairs

Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!

Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!

Page 7: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

7

M.I.T.

What To Do?What To Do?

Much higher level design environmentswe are stuck at the general purpose

programming level (conceivable

benefit limited)workflow and other higher level

graphical notations probably a good

idea

Much higher level design environmentswe are stuck at the general purpose

programming level (conceivable

benefit limited)workflow and other higher level

graphical notations probably a good

idea

Page 8: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

8

M.I.T.

What To Do?What To Do?

special purpose languages nice (why are

report writers shunned?)higher level versions of SQL and Xquery

See Informix Visionary for a cool example

special purpose languages nice (why are

report writers shunned?)higher level versions of SQL and Xquery

See Informix Visionary for a cool example

Page 9: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

9

M.I.T.

State of AffairsState of Affairs

Commercial products are way too hard

to usetakes people in white lab coats to get

them up and keep them upFull employment act for DBAs forever

Commercial products are way too hard

to usetakes people in white lab coats to get

them up and keep them upFull employment act for DBAs forever

Page 10: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

10

M.I.T.

What To Do?What To Do?

“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor

ones (buffer pool size, partitioning,

log buffer pool size, …)Error reporting stinks

“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor

ones (buffer pool size, partitioning,

log buffer pool size, …)Error reporting stinks

Page 11: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

11

M.I.T.

Interesting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus ResearchInteresting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus Research

BBCOZ entertainmentCiscoAkamaiFidelity

BBCOZ entertainmentCiscoAkamaiFidelity

My suggestion: NSF should require a letter of

support from a CIO with each grant proposal.

Page 12: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

12

M.I.T.

Interesting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBC

Digitize 50 years of British television

creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out

Random access to 1 Petabyte (or so)By the unwashed internet 200 million

Digitize 50 years of British television

creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out

Random access to 1 Petabyte (or so)By the unwashed internet 200 million

Page 13: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

13

M.I.T.

CNN VariationCNN VariationCNN VariationCNN Variation

On-line digital news editing by 300 news

directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering

in Bosnia

On-line digital news editing by 300 news

directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering

in Bosnia

Page 14: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

14

M.I.T.

What To Do?What To Do?What To Do?What To Do?

Content outlives support for the content

formatAutomatic content indexing

cannot afford a librarianGlobal scale distributed system

Staging and cachinghigh locality of reference

Content outlives support for the content

formatAutomatic content indexing

cannot afford a librarianGlobal scale distributed system

Staging and cachinghigh locality of reference

Page 15: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

15

M.I.T.

What To Do?What To Do?What To Do?What To Do?

Query model meets visualization systemsunwashed will not learn Xquery

Rights management incredibly sticky issue in whole area

Query model meets visualization systemsunwashed will not learn Xquery

Rights management incredibly sticky issue in whole area

Page 16: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

16

M.I.T.

Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment

New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser

New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser

Page 17: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

17

M.I.T.

What To Do?What To Do? What To Do?What To Do?

Large scale GISupdate intensive!

Large scale triggering problemalert me if there is a cancellation at X

and I am within 300 yards

Large scale GISupdate intensive!

Large scale triggering problemalert me if there is a cancellation at X

and I am within 300 yards

Page 18: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

18

M.I.T.

Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems

Supply chain of 60K suppliers for

custom goodsWant to query the transitive closure of

this supply chaincan I make 10 more routers next

week?

Supply chain of 60K suppliers for

custom goodsWant to query the transitive closure of

this supply chaincan I make 10 more routers next

week?

Page 19: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

19

M.I.T.

What To Do?What To Do? What To Do?What To Do?

Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter

Adapters for 1M (or so) legacy systemshow to write them semi-automatically?

Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter

Adapters for 1M (or so) legacy systemshow to write them semi-automatically?

Page 20: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

20

M.I.T.

Interesting Problem - AkamaiInteresting Problem - Akamai Interesting Problem - AkamaiInteresting Problem - Akamai

Billing is 95/55 minute intervalspay for bandwidth of 95th percentile

300 Gbytes a day (compressed) of click

stream data

Billing is 95/55 minute intervalspay for bandwidth of 95th percentile

300 Gbytes a day (compressed) of click

stream data

Biggest warehouses on the planet will

soon be click stream data!

Page 21: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

21

M.I.T.

Click Stream DataClick Stream Data Click Stream DataClick Stream Data

Customers want to mine their click

streamAnd Akamai only has a portion of it i.e. huge distributed data base

Query is “tell me something interesting” i.e. why are 95% of the shopping

carts abandoned?and not a pile of statistics

Customers want to mine their click

streamAnd Akamai only has a portion of it i.e. huge distributed data base

Query is “tell me something interesting” i.e. why are 95% of the shopping

carts abandoned?and not a pile of statistics

Page 22: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

22

M.I.T.

Interesting Problem - FidelityInteresting Problem - Fidelity Interesting Problem - FidelityInteresting Problem - Fidelity

Financial portal for high net worth individuals must connect to several hundred

Fidelity systemsCustomers want to know fairly complex things

i.e. rank my money manager against all

value managers for 1, 3 and 5 years

Financial portal for high net worth individuals must connect to several hundred

Fidelity systemsCustomers want to know fairly complex things

i.e. rank my money manager against all

value managers for 1, 3 and 5 years

Page 23: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

23

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Voice to NL to structured datavoice to NL works in focused verticals

(weather, airline schedules)but this is a pretty broad app

NL to structured data requires some workput in the joins look up vocabulary in the DBMS

Voice to NL to structured datavoice to NL works in focused verticals

(weather, airline schedules)but this is a pretty broad app

NL to structured data requires some workput in the joins look up vocabulary in the DBMS

Page 24: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

24

M.I.T.

What to Do?What to Do? What to Do?What to Do?

How to join unstructured data to structured datatell me the news stories about all stocks

which have increased in value more than 10%

today

How to join unstructured data to structured datatell me the news stories about all stocks

which have increased in value more than 10%

today

Page 25: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

25

M.I.T.

Mike’s PicksMike’s Picks Mike’s PicksMike’s Picks

Too much middlewareAkamai for structured data

Too much middlewareAkamai for structured data

Page 26: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

26

M.I.T.

Interesting Problem - MiddlewareInteresting Problem - Middleware Interesting Problem - MiddlewareInteresting Problem - Middleware

Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated

DBMS

Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated

DBMS

Page 27: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

27

M.I.T.

All of these systemsAll of these systems All of these systemsAll of these systems

Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems

Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems

Huge overlap in functionality!!

Page 28: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

28

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Consolidate weaker paradigms under

stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service

Consolidate weaker paradigms under

stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service

Middleware becomes DBMS-centric!

Page 29: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

29

M.I.T.

Interesting Problem - CachingInteresting Problem - Caching Interesting Problem - CachingInteresting Problem - Caching

Akamai et. al cache HTMLcloser to the browser that wants it

Would be nice to cache structured dataneed to cache application that uses

the dataand the data

Akamai et. al cache HTMLcloser to the browser that wants it

Would be nice to cache structured dataneed to cache application that uses

the dataand the data

Page 30: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

30

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?

Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?

Page 31: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

31

M.I.T.

History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)

Putting semantics into data order is badrestricts storage optionshidden meaning bad

Hierarchical representations for data are badrewrite the queries when representation

changes (data independence)Complexity is bad

Putting semantics into data order is badrestricts storage optionshidden meaning bad

Hierarchical representations for data are badrewrite the queries when representation

changes (data independence)Complexity is bad

Page 32: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

32

M.I.T.

My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)

As a storage format, XML is good for

documents not dataCodd’s thinking has not been repealed

(order, hierarchy, complexity)no binary format in line tags are inefficient

SGML run amok….

As a storage format, XML is good for

documents not dataCodd’s thinking has not been repealed

(order, hierarchy, complexity)no binary format in line tags are inefficient

SGML run amok….

Page 33: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

33

M.I.T.

My Spin on XMLMy Spin on XMLMy Spin on XMLMy Spin on XML

As an “on the wire” notation, XML is ok for

databut don’t try to move too much stuff and don’t try to move it too fast

Remember why client-server put in binary

movement!

As an “on the wire” notation, XML is ok for

databut don’t try to move too much stuff and don’t try to move it too fast

Remember why client-server put in binary

movement!

Page 34: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

34

M.I.T.

Xquery For DataXquery For Data Xquery For DataXquery For Data

Won’t store data in XMLNecessary to design something that is

easy to translate into SQLAlternate syntax for OR SQL

which is much cleaner (// is a user

defined function in Informix)

Won’t store data in XMLNecessary to design something that is

easy to translate into SQLAlternate syntax for OR SQL

which is much cleaner (// is a user

defined function in Informix)

Page 35: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

35

M.I.T.

XML SummaryXML SummaryXML SummaryXML Summary

Focus attention on XMLSchema as a

document description system not a data

description systemFocus Xquery on documents not data

Focus attention on XMLSchema as a

document description system not a data

description systemFocus Xquery on documents not data

W3C use cases do not do this!

Page 36: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

36

M.I.T.

OR DBMSOR DBMS OR DBMSOR DBMS

XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished

query optimizationdata base designphysical storage layout

XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished

query optimizationdata base designphysical storage layout

Page 37: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

37

M.I.T.

Grand Challenge #1Grand Challenge #1 Grand Challenge #1Grand Challenge #1

Preponderance of web accessible data is

structuredmuch more than “facts and figures”

Construct a system to access “the rest

of” the web

Preponderance of web accessible data is

structuredmuch more than “facts and figures”

Construct a system to access “the rest

of” the web

Page 38: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

38

M.I.T.

What To DoWhat To Do What To DoWhat To Do

GUI problem (NL or Vis)Query notation problemDiscovery problem

how do you “scrape” a structured

data web site to figure out the

meaning of its data?Federation problem

GUI problem (NL or Vis)Query notation problemDiscovery problem

how do you “scrape” a structured

data web site to figure out the

meaning of its data?Federation problem

Page 39: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

39

M.I.T.

Grand Challenge #2Grand Challenge #2 Grand Challenge #2Grand Challenge #2

Everything of material importance is geo-

positioned (lojacked)Construct the mother of all GIS systems

complete automation of supply chains “where is my wife” (or the closest

restroom)

Everything of material importance is geo-

positioned (lojacked)Construct the mother of all GIS systems

complete automation of supply chains “where is my wife” (or the closest

restroom)

Page 40: 1 So What To Do Next? Michael Stonebraker Adjunct Professor Massachusetts Institute of Technology (stonebraker@lcs.mit.edu)

40

M.I.T.

What To DoWhat To Do What To DoWhat To Do

Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy

problems

Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy

problems