review of claremont report on database research jiaheng lu renmin university of china

21
Review of Claremont Report on Database Research Jiaheng Lu Renmin University of China

Upload: eustacia-baldwin

Post on 23-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Review of Claremont Report on Database Research

Jiaheng Lu Renmin University of China

Outline Five challenges on database research

Database engine revisitingDeclarative programmingStructured and unstructured dataCloud data managementMobile application

Our research to meet those challenges

数据库的挑战 :Senior database researcher Meeting Senior database researchers have

gathered every few years to assess the state of database research and to recommend problems and problem areas deserve additional focus. Laguna Beach, Calif. in 1989 Palo Alto, Calif. (“Lagunita”) in 1990 and 1995 Cambridge, Mass. in 1996 Asilomar, Calif. in 1998 Lowell, Mass . In 2003

Claremont Meeting

About 20 Database researchers

Claremont Resort, Berkeley, CA May 29-30, 2008

Revisiting database engines(1)

Traditional data engine NOT work wellOLTP System: data provenance, schema

evolution and versioningText indexingMedia delivery……

Revisiting database engines(2)

Research topicsRemote RAM and flash as persistent mediaTreat query optimization and physical data a a

unified, adaptive, self-tuning taskCompressing and encrypting data with query

optimizationDesigning systems that embrace non-

relational data models

Declarative programming for Emerging platforms (1) Data-centric approach for emerging

platformsManycore chipsDistributed servicesCloud computing platforms…..

Declarative programming for Emerging platforms (2) Good examples

Map-reduce:

data-parallelism

Ruby, Rails

query-like logic

XQuery

The interplay of structured and unstructured data(1) Witnessing a growing amount of structured

dataMillions of database hidden (Deep Web)Millions of HTML tables and MashupsWeb 2.0 Service photo video websites

The interplay of structured and unstructured data(2) Research challenge:

Extract structured meaning for unstructured data (IR, ML)

Querying and deriving insight from heterogeneous data

Keyword queries Pay-as-you-go fashion

Cloud data management (1)

Cloud service: shared commodity hardware for computing and storageApplication service (salesforce.com)Storage service (Amazon Web service)Computing service (Google App Engine)Data service (Microsoft SQLServer data

center)

Cloud data management (2)

Research challengeSelf-management database: limited human

invention, various workloadsLarge scale query processing and optimizationData security and privacy with sharing

Mobile applications

“On the go” interaction

Location based service

Our research to meet challenges

XML search Approximate string search Cloud data management Mobile data privacy DataSpace,……

XML search (1) XML twig query processing (SIGMOD’05,

VLDB’05) Problem Statement

Given an XML twig pattern Q, and an XML database D, we need to find ALL the matches of Q on D.

An XML tree:

s1

s2

f1

p1

t1

t2

Section

Title Figure

Twig pattern: Query answers:

(s1, t1, f1) (s2, t2, f1) (s1, t2, f1)

XML search (2) XML keyword search (ICDE’09)

Problem Statement How to efficiently rank the results of XML keyword

query

Contribution: Extend TF/IDF by incorporating the structure of

XML data

Approximate string search Approximate string queries (ICDE’08,09)

Problem Statement Given a collection of string data, how to efficiently

perform approximate search

Schwarzenger

Samuel Jackson

Keanu ReevesStar

Search

Output: strings s that satisfy Sim(q,s)≤δOutput: strings s that satisfy Sim(q,s)≤δ

SchwarrzengerSchwarrzenger

18

Main Example

Query

1,2,3,4

0,1,2,4

Merge

Final answers

DataGrams

stick (st,ti,ic,ck)

Candidate string ids {1,2,3,4}

{1,2,3}

Double check for the real edit distance

st

ti

ic

ckcount >=2

Performance bottleneck!

id strings

0 rich

1 stick

2 stich

3 stuck

4 static

ck

ic

st

ta

ti…

1,3

0,1,2,4

1,2,3,4

4

1,2,4

1,2,4

1,3

ed(s,q)≤1

Cloud data management WAMDM实验室的分布式存储系统实验平台

Web-desktop1

Web-desktop2 Web-desktop3

Master

HRegion (Tablet) Server

HRegion (Tablet) Server

Web-desktop1

Web-desktop2 Web-desktop3

Master(NameNode)

Slave(DataNode)

Slave(DataNode)

Hbase

HDFS

Research topics about cloud data Self management and self tuning

Query optimization on thousands of nodes

Thank you

Q & A

WAMDM lab website: http://idke.ruc.edu.cn/