web data management raghu ramakrishnan. - 2 - research quiq lessons structured data management...

8
Web Data Management Raghu Ramakrishnan

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

Web Data Management

Raghu Ramakrishnan

Page 2: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 2 -Research

QUIQ Lessons

• Structured data management powers scalable collaboration environments

• ASP

• Multi-tenancy

• Massively distributed

• Fine-grained permissions, hierarchical acls

• RDBMSs were a lousy fit

Page 3: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 3 -Research

Cloud Computing: Computing as a Service

Cloud Computing

CPU IntensiveData Intensive

AnalyticE.g., SSDS,Hadoop

PackagedSoftware

High-throughputE.g., Condor

“Transactional”Storage & Serving

E.g., PNUTS, S3, SSDS, UDB

Page 4: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 4 -Research

Implications

• Data management as a service– Scientists and others who’ve resisted (installing, maintaining, and) using DBMSs

will find it much easier to reap the benefits– “Data centers” and “Computing Centers” will come into vogue again

• Hosted back-ends and RAD tools will make Web application development accessible to all– The Web is becoming open

• E.g., OpenSocial, OpenID • Ideas will be the most valuable currency, not the wherewithal to build complex systems

• Paradigm shifts possible for how we do research in many fields– Build applications that embed your algorithms and test them directly in the field—

Computer Scientists can interact directly with users (ironically, this would still be a breakthrough of sorts after four decades!)

– Many other disciplines (e.g., Sociology, microeconomics) can design and conduct online experiments involving unprecedented numbers of participants

Page 5: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 5 -Research

PNUTS: DB in the Cloud

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

Parallel databaseParallel database Geographic replicationGeographic replication

Indexes and viewsIndexes and views

Structured, flexible schemaStructured, flexible schema

Hosted, managed infrastructureHosted, managed infrastructure

Page 6: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 6 -Research

Basic Consistency Model

Goal: • Make it easier for applications to reason about updates and cope with asynchrony—alternative to

“transactions” in an asynchronous world• What happens to a record with primary key “Brian”?

Guarantees:• Every reader will always see some consistent, but possibly stale version• Readers can request a more up-to-date version, but may pay extra latency

– Special case: Critical read (writer/readers see their own writes)• Writers can verify that the record is still at the version they expect

Time

Record inserted

Update Update Delete

v. 1 v. 2 v. 3

Generation 1

Record inserted

Update Update Delete

v. 1 v. 2 v. 4

Generation 2

Update

v. 3

Record inserted Delete

v. 1

Generation 3

Page 7: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 7 -Research

Lots of Issues to Re-think

• Massive distribution & replication– Asynchrony– Availability– Consistency

• DBA to the world– Auto-tuning– Multi-tenancy– Access control (granularity, online ids)– Encryption

• App-support– Caching

Page 8: Web Data Management Raghu Ramakrishnan. - 2 - Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy

- 8 -Research

Querying the Web

• Search will become more semantic—best-effort match-making between: – Query intent (NLP, query logs …)– Interpreted web content

• Deep web has a lot of structured data– How we get a handle on it is an interesting problem– But this is only part of the problem … lots of data not here

• Semantic web isn’t working• Site-wrapping doesn’t scale

• Solutions?– Domain-wrapping – Mass collaboration– ??