1 so what to do next? michael stonebraker adjunct professor massachusetts institute of technology...

Post on 14-Jan-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

So What To Do Next?

Michael Stonebraker

Adjunct Professor

Massachusetts Institute of Technology

(stonebraker@lcs.mit.edu)

2

M.I.T.

Where To Find ProblemsWhere To Find ProblemsWhere To Find ProblemsWhere To Find Problems

State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges

State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges

3

M.I.T.

State of AffairsState of AffairsState of AffairsState of Affairs

IT failure rateSoftware half-lifeNo knobs

IT failure rateSoftware half-lifeNo knobs

4

M.I.T.

State of AffairsState of Affairs

~50-75% of IT projects fail if we built bridges, our profession

would be firedand the same mistakes are repeated

over and over (excessive ambition,

rolling specs, bad design, failure to

load a large data set early)

~50-75% of IT projects fail if we built bridges, our profession

would be firedand the same mistakes are repeated

over and over (excessive ambition,

rolling specs, bad design, failure to

load a large data set early)

5

M.I.T.

What To Do?What To Do?

We typically don’t teach this stuffprobably because we don’t (can’t)

spend any time in industry to figure it

out

We typically don’t teach this stuffprobably because we don’t (can’t)

spend any time in industry to figure it

out

Action item: at the very least read a couple of Robert L. Glass’s books

6

M.I.T.

State of AffairsState of Affairs

Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!

Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!

7

M.I.T.

What To Do?What To Do?

Much higher level design environmentswe are stuck at the general purpose

programming level (conceivable

benefit limited)workflow and other higher level

graphical notations probably a good

idea

Much higher level design environmentswe are stuck at the general purpose

programming level (conceivable

benefit limited)workflow and other higher level

graphical notations probably a good

idea

8

M.I.T.

What To Do?What To Do?

special purpose languages nice (why are

report writers shunned?)higher level versions of SQL and Xquery

See Informix Visionary for a cool example

special purpose languages nice (why are

report writers shunned?)higher level versions of SQL and Xquery

See Informix Visionary for a cool example

9

M.I.T.

State of AffairsState of Affairs

Commercial products are way too hard

to usetakes people in white lab coats to get

them up and keep them upFull employment act for DBAs forever

Commercial products are way too hard

to usetakes people in white lab coats to get

them up and keep them upFull employment act for DBAs forever

10

M.I.T.

What To Do?What To Do?

“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor

ones (buffer pool size, partitioning,

log buffer pool size, …)Error reporting stinks

“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor

ones (buffer pool size, partitioning,

log buffer pool size, …)Error reporting stinks

11

M.I.T.

Interesting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus ResearchInteresting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus Research

BBCOZ entertainmentCiscoAkamaiFidelity

BBCOZ entertainmentCiscoAkamaiFidelity

My suggestion: NSF should require a letter of

support from a CIO with each grant proposal.

12

M.I.T.

Interesting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBC

Digitize 50 years of British television

creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out

Random access to 1 Petabyte (or so)By the unwashed internet 200 million

Digitize 50 years of British television

creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out

Random access to 1 Petabyte (or so)By the unwashed internet 200 million

13

M.I.T.

CNN VariationCNN VariationCNN VariationCNN Variation

On-line digital news editing by 300 news

directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering

in Bosnia

On-line digital news editing by 300 news

directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering

in Bosnia

14

M.I.T.

What To Do?What To Do?What To Do?What To Do?

Content outlives support for the content

formatAutomatic content indexing

cannot afford a librarianGlobal scale distributed system

Staging and cachinghigh locality of reference

Content outlives support for the content

formatAutomatic content indexing

cannot afford a librarianGlobal scale distributed system

Staging and cachinghigh locality of reference

15

M.I.T.

What To Do?What To Do?What To Do?What To Do?

Query model meets visualization systemsunwashed will not learn Xquery

Rights management incredibly sticky issue in whole area

Query model meets visualization systemsunwashed will not learn Xquery

Rights management incredibly sticky issue in whole area

16

M.I.T.

Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment

New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser

New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser

17

M.I.T.

What To Do?What To Do? What To Do?What To Do?

Large scale GISupdate intensive!

Large scale triggering problemalert me if there is a cancellation at X

and I am within 300 yards

Large scale GISupdate intensive!

Large scale triggering problemalert me if there is a cancellation at X

and I am within 300 yards

18

M.I.T.

Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems

Supply chain of 60K suppliers for

custom goodsWant to query the transitive closure of

this supply chaincan I make 10 more routers next

week?

Supply chain of 60K suppliers for

custom goodsWant to query the transitive closure of

this supply chaincan I make 10 more routers next

week?

19

M.I.T.

What To Do?What To Do? What To Do?What To Do?

Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter

Adapters for 1M (or so) legacy systemshow to write them semi-automatically?

Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter

Adapters for 1M (or so) legacy systemshow to write them semi-automatically?

20

M.I.T.

Interesting Problem - AkamaiInteresting Problem - Akamai Interesting Problem - AkamaiInteresting Problem - Akamai

Billing is 95/55 minute intervalspay for bandwidth of 95th percentile

300 Gbytes a day (compressed) of click

stream data

Billing is 95/55 minute intervalspay for bandwidth of 95th percentile

300 Gbytes a day (compressed) of click

stream data

Biggest warehouses on the planet will

soon be click stream data!

21

M.I.T.

Click Stream DataClick Stream Data Click Stream DataClick Stream Data

Customers want to mine their click

streamAnd Akamai only has a portion of it i.e. huge distributed data base

Query is “tell me something interesting” i.e. why are 95% of the shopping

carts abandoned?and not a pile of statistics

Customers want to mine their click

streamAnd Akamai only has a portion of it i.e. huge distributed data base

Query is “tell me something interesting” i.e. why are 95% of the shopping

carts abandoned?and not a pile of statistics

22

M.I.T.

Interesting Problem - FidelityInteresting Problem - Fidelity Interesting Problem - FidelityInteresting Problem - Fidelity

Financial portal for high net worth individuals must connect to several hundred

Fidelity systemsCustomers want to know fairly complex things

i.e. rank my money manager against all

value managers for 1, 3 and 5 years

Financial portal for high net worth individuals must connect to several hundred

Fidelity systemsCustomers want to know fairly complex things

i.e. rank my money manager against all

value managers for 1, 3 and 5 years

23

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Voice to NL to structured datavoice to NL works in focused verticals

(weather, airline schedules)but this is a pretty broad app

NL to structured data requires some workput in the joins look up vocabulary in the DBMS

Voice to NL to structured datavoice to NL works in focused verticals

(weather, airline schedules)but this is a pretty broad app

NL to structured data requires some workput in the joins look up vocabulary in the DBMS

24

M.I.T.

What to Do?What to Do? What to Do?What to Do?

How to join unstructured data to structured datatell me the news stories about all stocks

which have increased in value more than 10%

today

How to join unstructured data to structured datatell me the news stories about all stocks

which have increased in value more than 10%

today

25

M.I.T.

Mike’s PicksMike’s Picks Mike’s PicksMike’s Picks

Too much middlewareAkamai for structured data

Too much middlewareAkamai for structured data

26

M.I.T.

Interesting Problem - MiddlewareInteresting Problem - Middleware Interesting Problem - MiddlewareInteresting Problem - Middleware

Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated

DBMS

Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated

DBMS

27

M.I.T.

All of these systemsAll of these systems All of these systemsAll of these systems

Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems

Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems

Huge overlap in functionality!!

28

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Consolidate weaker paradigms under

stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service

Consolidate weaker paradigms under

stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service

Middleware becomes DBMS-centric!

29

M.I.T.

Interesting Problem - CachingInteresting Problem - Caching Interesting Problem - CachingInteresting Problem - Caching

Akamai et. al cache HTMLcloser to the browser that wants it

Would be nice to cache structured dataneed to cache application that uses

the dataand the data

Akamai et. al cache HTMLcloser to the browser that wants it

Would be nice to cache structured dataneed to cache application that uses

the dataand the data

30

M.I.T.

What to Do?What to Do? What to Do?What to Do?

Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?

Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?

31

M.I.T.

History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)

Putting semantics into data order is badrestricts storage optionshidden meaning bad

Hierarchical representations for data are badrewrite the queries when representation

changes (data independence)Complexity is bad

Putting semantics into data order is badrestricts storage optionshidden meaning bad

Hierarchical representations for data are badrewrite the queries when representation

changes (data independence)Complexity is bad

32

M.I.T.

My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)

As a storage format, XML is good for

documents not dataCodd’s thinking has not been repealed

(order, hierarchy, complexity)no binary format in line tags are inefficient

SGML run amok….

As a storage format, XML is good for

documents not dataCodd’s thinking has not been repealed

(order, hierarchy, complexity)no binary format in line tags are inefficient

SGML run amok….

33

M.I.T.

My Spin on XMLMy Spin on XMLMy Spin on XMLMy Spin on XML

As an “on the wire” notation, XML is ok for

databut don’t try to move too much stuff and don’t try to move it too fast

Remember why client-server put in binary

movement!

As an “on the wire” notation, XML is ok for

databut don’t try to move too much stuff and don’t try to move it too fast

Remember why client-server put in binary

movement!

34

M.I.T.

Xquery For DataXquery For Data Xquery For DataXquery For Data

Won’t store data in XMLNecessary to design something that is

easy to translate into SQLAlternate syntax for OR SQL

which is much cleaner (// is a user

defined function in Informix)

Won’t store data in XMLNecessary to design something that is

easy to translate into SQLAlternate syntax for OR SQL

which is much cleaner (// is a user

defined function in Informix)

35

M.I.T.

XML SummaryXML SummaryXML SummaryXML Summary

Focus attention on XMLSchema as a

document description system not a data

description systemFocus Xquery on documents not data

Focus attention on XMLSchema as a

document description system not a data

description systemFocus Xquery on documents not data

W3C use cases do not do this!

36

M.I.T.

OR DBMSOR DBMS OR DBMSOR DBMS

XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished

query optimizationdata base designphysical storage layout

XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished

query optimizationdata base designphysical storage layout

37

M.I.T.

Grand Challenge #1Grand Challenge #1 Grand Challenge #1Grand Challenge #1

Preponderance of web accessible data is

structuredmuch more than “facts and figures”

Construct a system to access “the rest

of” the web

Preponderance of web accessible data is

structuredmuch more than “facts and figures”

Construct a system to access “the rest

of” the web

38

M.I.T.

What To DoWhat To Do What To DoWhat To Do

GUI problem (NL or Vis)Query notation problemDiscovery problem

how do you “scrape” a structured

data web site to figure out the

meaning of its data?Federation problem

GUI problem (NL or Vis)Query notation problemDiscovery problem

how do you “scrape” a structured

data web site to figure out the

meaning of its data?Federation problem

39

M.I.T.

Grand Challenge #2Grand Challenge #2 Grand Challenge #2Grand Challenge #2

Everything of material importance is geo-

positioned (lojacked)Construct the mother of all GIS systems

complete automation of supply chains “where is my wife” (or the closest

restroom)

Everything of material importance is geo-

positioned (lojacked)Construct the mother of all GIS systems

complete automation of supply chains “where is my wife” (or the closest

restroom)

40

M.I.T.

What To DoWhat To Do What To DoWhat To Do

Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy

problems

Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy

problems

top related