1 so what to do next? michael stonebraker adjunct professor massachusetts institute of technology...
TRANSCRIPT
1
So What To Do Next?
Michael Stonebraker
Adjunct Professor
Massachusetts Institute of Technology
2
M.I.T.
Where To Find ProblemsWhere To Find ProblemsWhere To Find ProblemsWhere To Find Problems
State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges
State of affairsInteresting industrial problemsMike’s picksMy whine on XMLGrand challenges
3
M.I.T.
State of AffairsState of AffairsState of AffairsState of Affairs
IT failure rateSoftware half-lifeNo knobs
IT failure rateSoftware half-lifeNo knobs
4
M.I.T.
State of AffairsState of Affairs
~50-75% of IT projects fail if we built bridges, our profession
would be firedand the same mistakes are repeated
over and over (excessive ambition,
rolling specs, bad design, failure to
load a large data set early)
~50-75% of IT projects fail if we built bridges, our profession
would be firedand the same mistakes are repeated
over and over (excessive ambition,
rolling specs, bad design, failure to
load a large data set early)
5
M.I.T.
What To Do?What To Do?
We typically don’t teach this stuffprobably because we don’t (can’t)
spend any time in industry to figure it
out
We typically don’t teach this stuffprobably because we don’t (can’t)
spend any time in industry to figure it
out
Action item: at the very least read a couple of Robert L. Glass’s books
6
M.I.T.
State of AffairsState of Affairs
Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!
Hardware “half-life” is 18 monthsSoftware half-life is 18 years (or more)!
7
M.I.T.
What To Do?What To Do?
Much higher level design environmentswe are stuck at the general purpose
programming level (conceivable
benefit limited)workflow and other higher level
graphical notations probably a good
idea
Much higher level design environmentswe are stuck at the general purpose
programming level (conceivable
benefit limited)workflow and other higher level
graphical notations probably a good
idea
8
M.I.T.
What To Do?What To Do?
special purpose languages nice (why are
report writers shunned?)higher level versions of SQL and Xquery
See Informix Visionary for a cool example
special purpose languages nice (why are
report writers shunned?)higher level versions of SQL and Xquery
See Informix Visionary for a cool example
9
M.I.T.
State of AffairsState of Affairs
Commercial products are way too hard
to usetakes people in white lab coats to get
them up and keep them upFull employment act for DBAs forever
Commercial products are way too hard
to usetakes people in white lab coats to get
them up and keep them upFull employment act for DBAs forever
10
M.I.T.
What To Do?What To Do?
“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor
ones (buffer pool size, partitioning,
log buffer pool size, …)Error reporting stinks
“No knobs” only buttons are “go” and “stop”all tuning automatic index selection is one of the minor
ones (buffer pool size, partitioning,
log buffer pool size, …)Error reporting stinks
11
M.I.T.
Interesting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus ResearchInteresting Industrial Problems Interesting Industrial Problems Should Focus ResearchShould Focus Research
BBCOZ entertainmentCiscoAkamaiFidelity
BBCOZ entertainmentCiscoAkamaiFidelity
My suggestion: NSF should require a letter of
support from a CIO with each grant proposal.
12
M.I.T.
Interesting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBCInteresting Problems -- BBC
Digitize 50 years of British television
creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out
Random access to 1 Petabyte (or so)By the unwashed internet 200 million
Digitize 50 years of British television
creativity want to serve it up on demandespecially British soccer gamesmedia is wearing out
Random access to 1 Petabyte (or so)By the unwashed internet 200 million
13
M.I.T.
CNN VariationCNN VariationCNN VariationCNN Variation
On-line digital news editing by 300 news
directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering
in Bosnia
On-line digital news editing by 300 news
directorswho want to find Monica Lewinskyand 30 seconds of footage on suffering
in Bosnia
14
M.I.T.
What To Do?What To Do?What To Do?What To Do?
Content outlives support for the content
formatAutomatic content indexing
cannot afford a librarianGlobal scale distributed system
Staging and cachinghigh locality of reference
Content outlives support for the content
formatAutomatic content indexing
cannot afford a librarianGlobal scale distributed system
Staging and cachinghigh locality of reference
15
M.I.T.
What To Do?What To Do?What To Do?What To Do?
Query model meets visualization systemsunwashed will not learn Xquery
Rights management incredibly sticky issue in whole area
Query model meets visualization systemsunwashed will not learn Xquery
Rights management incredibly sticky issue in whole area
16
M.I.T.
Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment Interesting Problem - OZ EntertainmentInteresting Problem - OZ Entertainment
New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser
New theme park near Kansas City“no lines”no lost kidsvirtual theme park as teaser
17
M.I.T.
What To Do?What To Do? What To Do?What To Do?
Large scale GISupdate intensive!
Large scale triggering problemalert me if there is a cancellation at X
and I am within 300 yards
Large scale GISupdate intensive!
Large scale triggering problemalert me if there is a cancellation at X
and I am within 300 yards
18
M.I.T.
Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems Interesting Problem - Cisco SystemsInteresting Problem - Cisco Systems
Supply chain of 60K suppliers for
custom goodsWant to query the transitive closure of
this supply chaincan I make 10 more routers next
week?
Supply chain of 60K suppliers for
custom goodsWant to query the transitive closure of
this supply chaincan I make 10 more routers next
week?
19
M.I.T.
What To Do?What To Do? What To Do?What To Do?
Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter
Adapters for 1M (or so) legacy systemshow to write them semi-automatically?
Huge federated systemcentral metadata a non-starterno single DBAglobal query optimizer a non-starter
Adapters for 1M (or so) legacy systemshow to write them semi-automatically?
20
M.I.T.
Interesting Problem - AkamaiInteresting Problem - Akamai Interesting Problem - AkamaiInteresting Problem - Akamai
Billing is 95/55 minute intervalspay for bandwidth of 95th percentile
300 Gbytes a day (compressed) of click
stream data
Billing is 95/55 minute intervalspay for bandwidth of 95th percentile
300 Gbytes a day (compressed) of click
stream data
Biggest warehouses on the planet will
soon be click stream data!
21
M.I.T.
Click Stream DataClick Stream Data Click Stream DataClick Stream Data
Customers want to mine their click
streamAnd Akamai only has a portion of it i.e. huge distributed data base
Query is “tell me something interesting” i.e. why are 95% of the shopping
carts abandoned?and not a pile of statistics
Customers want to mine their click
streamAnd Akamai only has a portion of it i.e. huge distributed data base
Query is “tell me something interesting” i.e. why are 95% of the shopping
carts abandoned?and not a pile of statistics
22
M.I.T.
Interesting Problem - FidelityInteresting Problem - Fidelity Interesting Problem - FidelityInteresting Problem - Fidelity
Financial portal for high net worth individuals must connect to several hundred
Fidelity systemsCustomers want to know fairly complex things
i.e. rank my money manager against all
value managers for 1, 3 and 5 years
Financial portal for high net worth individuals must connect to several hundred
Fidelity systemsCustomers want to know fairly complex things
i.e. rank my money manager against all
value managers for 1, 3 and 5 years
23
M.I.T.
What to Do?What to Do? What to Do?What to Do?
Voice to NL to structured datavoice to NL works in focused verticals
(weather, airline schedules)but this is a pretty broad app
NL to structured data requires some workput in the joins look up vocabulary in the DBMS
Voice to NL to structured datavoice to NL works in focused verticals
(weather, airline schedules)but this is a pretty broad app
NL to structured data requires some workput in the joins look up vocabulary in the DBMS
24
M.I.T.
What to Do?What to Do? What to Do?What to Do?
How to join unstructured data to structured datatell me the news stories about all stocks
which have increased in value more than 10%
today
How to join unstructured data to structured datatell me the news stories about all stocks
which have increased in value more than 10%
today
25
M.I.T.
Mike’s PicksMike’s Picks Mike’s PicksMike’s Picks
Too much middlewareAkamai for structured data
Too much middlewareAkamai for structured data
26
M.I.T.
Interesting Problem - MiddlewareInteresting Problem - Middleware Interesting Problem - MiddlewareInteresting Problem - Middleware
Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated
DBMS
Average enterprise has one (or more) app serversone (or more) EAI packages one (or more) ETL packagesone (or more) portal productsone (or more) application packagesand maybe someday a federated
DBMS
27
M.I.T.
All of these systemsAll of these systems All of these systemsAll of these systems
Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems
Contain transformation enginesAnd often do function activation (app service)And often have adapters to legacy systems
Huge overlap in functionality!!
28
M.I.T.
What to Do?What to Do? What to Do?What to Do?
Consolidate weaker paradigms under
stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service
Consolidate weaker paradigms under
stronger onese.g. federated DBMS subsumes ETLOR DBMS subsumes app service
Middleware becomes DBMS-centric!
29
M.I.T.
Interesting Problem - CachingInteresting Problem - Caching Interesting Problem - CachingInteresting Problem - Caching
Akamai et. al cache HTMLcloser to the browser that wants it
Would be nice to cache structured dataneed to cache application that uses
the dataand the data
Akamai et. al cache HTMLcloser to the browser that wants it
Would be nice to cache structured dataneed to cache application that uses
the dataand the data
30
M.I.T.
What to Do?What to Do? What to Do?What to Do?
Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?
Materialized views are a predefined solutionNice to have a more dynamic oneCache (query, answer) pairs?
31
M.I.T.
History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)History Lesson (Codd)
Putting semantics into data order is badrestricts storage optionshidden meaning bad
Hierarchical representations for data are badrewrite the queries when representation
changes (data independence)Complexity is bad
Putting semantics into data order is badrestricts storage optionshidden meaning bad
Hierarchical representations for data are badrewrite the queries when representation
changes (data independence)Complexity is bad
32
M.I.T.
My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)My Spin on XML (XMLSchema)
As a storage format, XML is good for
documents not dataCodd’s thinking has not been repealed
(order, hierarchy, complexity)no binary format in line tags are inefficient
SGML run amok….
As a storage format, XML is good for
documents not dataCodd’s thinking has not been repealed
(order, hierarchy, complexity)no binary format in line tags are inefficient
SGML run amok….
33
M.I.T.
My Spin on XMLMy Spin on XMLMy Spin on XMLMy Spin on XML
As an “on the wire” notation, XML is ok for
databut don’t try to move too much stuff and don’t try to move it too fast
Remember why client-server put in binary
movement!
As an “on the wire” notation, XML is ok for
databut don’t try to move too much stuff and don’t try to move it too fast
Remember why client-server put in binary
movement!
34
M.I.T.
Xquery For DataXquery For Data Xquery For DataXquery For Data
Won’t store data in XMLNecessary to design something that is
easy to translate into SQLAlternate syntax for OR SQL
which is much cleaner (// is a user
defined function in Informix)
Won’t store data in XMLNecessary to design something that is
easy to translate into SQLAlternate syntax for OR SQL
which is much cleaner (// is a user
defined function in Informix)
35
M.I.T.
XML SummaryXML SummaryXML SummaryXML Summary
Focus attention on XMLSchema as a
document description system not a data
description systemFocus Xquery on documents not data
Focus attention on XMLSchema as a
document description system not a data
description systemFocus Xquery on documents not data
W3C use cases do not do this!
36
M.I.T.
OR DBMSOR DBMS OR DBMSOR DBMS
XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished
query optimizationdata base designphysical storage layout
XML is merely this year’s data typeNext year it will be WML or ...OR is still not finished
query optimizationdata base designphysical storage layout
37
M.I.T.
Grand Challenge #1Grand Challenge #1 Grand Challenge #1Grand Challenge #1
Preponderance of web accessible data is
structuredmuch more than “facts and figures”
Construct a system to access “the rest
of” the web
Preponderance of web accessible data is
structuredmuch more than “facts and figures”
Construct a system to access “the rest
of” the web
38
M.I.T.
What To DoWhat To Do What To DoWhat To Do
GUI problem (NL or Vis)Query notation problemDiscovery problem
how do you “scrape” a structured
data web site to figure out the
meaning of its data?Federation problem
GUI problem (NL or Vis)Query notation problemDiscovery problem
how do you “scrape” a structured
data web site to figure out the
meaning of its data?Federation problem
39
M.I.T.
Grand Challenge #2Grand Challenge #2 Grand Challenge #2Grand Challenge #2
Everything of material importance is geo-
positioned (lojacked)Construct the mother of all GIS systems
complete automation of supply chains “where is my wife” (or the closest
restroom)
Everything of material importance is geo-
positioned (lojacked)Construct the mother of all GIS systems
complete automation of supply chains “where is my wife” (or the closest
restroom)
40
M.I.T.
What To DoWhat To Do What To DoWhat To Do
Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy
problems
Most of the issues in GC #1 The mother of all triggering problemsThe mother of all security/privacy
problems