crius: user-friendly database design li (eric) qian, kristen lefevre, h. v. jagadish university of...

32
CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Upload: johnathan-blake

Post on 11-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

CRIUS: User-Friendly Database Design

Li (Eric) Qian, Kristen LeFevre, H. V. JagadishUniversity of Michigan, Ann Arbor

Page 2: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 3: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Non-technical people directly exposed to data.

Hard to design a schema in advance. Start with a simple structure and grow it as

needed. We call this process organic schema

evolution

Motivation

Page 4: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Motivation Cont’d While users have the freedom of organically

growing their schema, the data is now subject to denormalization.

Consequently, users have to explicitly deal with duplicated data entries, which may produce errors that violate integrity constraints.

Therefore, an organic database system must: Make it easy for the end user to make schema changes Guarantee efficient and safe data entry Implement these features with low cost

Page 5: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Challenges Schema Update Specification Data Migration Data Entry Schema Evolution Performance

Page 6: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 7: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Flat spreadsheets

Name City Address

Keith Ann Arbor 202 Main

Name City Address

Mary Chicago 2364 Bishop

Keith Ann Arbor 101 Plymouth

Spreadsheet?

ID Name City

1 Mary Chicago

2 Keith Ann Arbor

ID Address

1 2364 Bishop

2 101 Plymouth

2 202 MainPerson Address

v.s. Hierarchical semantics

Page 8: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

How to support hierarchical semantics? We permit nesting!

Name City[Address]

Address

Mary Chicago 2364 Bishop

Keith Ann Arbor101 Plymouth

202 Main

Page 9: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Span Table

Span Table: a next-generation spreadsheet that nests data in a single representation:

Specify an evolution by dragging StateName inside Address

Specify an evolution by dragging Person upward.

schem

adat

a

Page 10: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 11: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Data Migration in Schema Evolution Data needs to be migrated from the old schema to the new

one. May involve data copy/merge. Users need to edit in a cell-by-cell manner.

Name City Address

Mary Chicago 2364 Bishop

Keith Ann Arbor 101 Plymouth

Keith Ann Arbor 202 Main

Name City [Address]

Address

Mary Chicago 2364 Bishop

Keith Ann Arbor101 Plymouth

202 Main

Page 12: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Introducing Operators! Schema restructuring operators:

IMPORT, EXPORT, FLOAT, SINK Extended spreadsheet operators:

Schema modification: Adding/Dropping Columns Data manipulation: Inserting/Deleting/Updating Tuples

Collectively, we call this set of operators Span Table Algebra.

Page 13: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Span Table Algebra:Schema Restructuring Operators

Operator Description

Import(A) Move A inward into a descendant relation.

Export(A) Move A outward into an ancestor relation.

Sink(A) Push A to create a new leaf relation.

Float(A) Lift A to create a new intermediate level.

Name City Address

Mary Chicago 2364 Bishop

Keith Ann Arbor 101 Plymouth

Keith Ann Arbor 202 Main

Sink(Address)Name City [Address]

Address

Mary Chicago 2364 Bishop

Keith Ann Arbor101 Plymouth

202 Main

Name [Address]

City Address

Mary Chicago 2364 Bishop

KeithAnn Arbor 101 Plymouth

Ann Arbor 202 Main

Import(City)Export(City)Name City [Address]

Address

Mary Chicago 2364 Bishop

Keith Ann Arbor101 Plymouth

202 Main

Page 14: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Span Table Algebra: Expressive Power Analysis Import and Export etc. can be expressed in terms of Nest

and Unnest:

Nest and Unnest can be expressed as a sequence of Span Table Operators:

 

 

Detailed proofs in paper appendix.

Page 15: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 16: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Inevitable Denormalization Traditional design uses data integrity constraints We can not do this since we have no pre-defined

constraints Denormalization

A B C

a0 b0 c0

a0 b0 c1

FD: A B

A B C

a0 b0 c0

a0 b1 c1

Page 17: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Guide User Data Entry We maintain a set of “soft” functional

dependencies (FDs) to guide user data entry: Inductive completion Error prevention

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math B

FD: Name Grade

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math B

Leo

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math B

Leo B

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math B

Leo C

(1) rollback

(2) also update relevant entries to preserve data integrity(3) force the entry and update the soft FDs.

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math C

Leo C

ID Name Course Grade

1 Peter Math A

2 Peter Physics A

3 Leo Math B

Leo C

FD: Name, Course Grade

Page 18: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

How to Manage FDs? Frequent data entry Frequent FD re-induction Past solution too expensive to be applied Incremental FD Induction (IFDI):

Induce Initial FDs and maintain important data structures. Maintain these structures and incrementally re-induce

FDs. We optimize the way to update these structures so that

the algorithm is able to respond in real time.

Page 19: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 20: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Vertical Partitioning

Span tables are vertically partitioned and stored in relational databases.

Connecting span table to underlying storage: Upward mapping Downward mapping

Page 21: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Outline Motivation Interface Algebra Guidance Feature Storage Evaluation

Page 22: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation: Our experiments are designed to answer four

questions: Span Table usability Guidance feature usability IFDI efficiency Storage performance

Page 23: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation:User Study on Schema Operations Tasks:

Schema Design: Create the schema for an address book. Schema Update: Move an attribute from one relation to another in a

gene database. Measure:

Time to complete each task. Compared against SSMS (MS SQL Server Management

Studio 2008).

All users failed in this task using SSMS since they were unable to migrate the data manually. In contrast, all of them were able to complete the task within seconds with CRIUS.

Schema Design Schema Update

Page 24: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation:User study on Integrity-Based Guidance

The three tasks: Insert a new contact and his

address into the address book. Update the cell phone number

of one contact. Update the address of one

contact to the address of another contact.

Measure: time to complete each task, and overall count of key

strokes/mouse clicks. Compare with and without the

guidance feature on.

Page 25: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Conclusion The design and implementation of CRIUS Span table algebra Integrity-based guidance based on IFDI Storage Evaluation

Page 26: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Questions

??

Page 27: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

IFDI: Inducing Initial FDs

ID Name Course

Grade

1 Peter Math A

2 Peter Physics

A

3 Leo Math B

4 Leo Physics

B

5 Jack Math AAttribute Partitions:PN = {(1,2), (3,4), (5)}PC = {(1,3,5), (2,4)}PG = {(1,2,5), (3,4)}PNC = {(1), (2), (3), (4), (5)}PNG = {(1,2), (3,4), (5)}PCG = {(1,5), (2), (3), (4)}PNCG = {(1), (2), (3), (4), (5)}

X Y iff PX = PXUY

NCG

NGNC CG

CN G

{(1,2), (3,4)} {(1,3,5), (2,4)} {(1,2,5), (3,4)}

{} {(1,2), (3,4)} {(1,5)}

{}

Attribute Lattice:

N G since PN = PNG NC G since PNC = PNCG (dominated by the above)

PXUY = PX · PY

Page 28: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

IFDI: Maintaining FDs on Value UpdateID Name Cours

eGrade

1 Peter Math A

2 Peter Physics

AB

3 Leo Math B

4 Leo Physics

B

5 Jack Math A

X Y iff PX = PXUY

NCG

NGNC CG

CN G

{(1,2), (3,4)} {(1,3,5), (2,4)}{(1,2,5), (3,4)}

{}

{(1,2), (3,4)}

{(1,5)}

{}

Attribute Lattice:

N G no longer holds since PN ≠ PNG NC G since PNC = PNCG

{(3,4)}

{(1,5), (2,4)}

{}

{(1,5), (2,3,4)}

↑Attribute Partitions:PN = {(1,2), (3,4), (5)}PC = {(1,3,5), (2,4)}PG = {(1,2,5), (3,4)} PG = {(1,5), (2,3,4)}PNC = {(1), (2), (3), (4), (5)}PNG = {(1,2), (3,4), (5)} PNG = {(1), (2), (3,4), (5)}PCG = {(1,5), (2), (3), (4)} PCG = {(1,5), (2, 4), (3)}PNCG = {(1), (2), (3), (4), (5)} PNCG = {(1), (2), (3), (4), (5)}

Only visit half of the lattice nodes!

Page 29: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

P’G = {(1,5), (2,3,4)}

PC = {(1,3,5), (2,4)}

PCG = {(1,5), (2)}PCG = {(1,5), (2), (3), (4)}

IFDI: Maintaining FDs on Value UpdateCont’d How do we efficiently update attribute

partitions?PCG = {(1,5), (2), (3), (4)} PCG = {(1,5), (2, 4), (3)} when tuple 2 is updated.

PC = {(1,3,5), (2,4)}PG = {(1,2,5), (3,4)}

PCG = {}

S1 = {1,5}S2 = {2}

PG = {(1,2,5), (3,4)}PG = {(1,2,5), (3,4)}

S1 = {3}S2 = {4}S1 = {}S2 = {}

PC = {(1,3,5), (2,4)}

P’G = {(1,5), (2,3,4)}

P’CG = {(1,5), (2, 4), (3)}

PCG = PC · PG

P’CG = PC · P’G

PC = {(1,3,5), (2,4)}

Naively re-computing product:

Incrementally update product:

P’G = {(1,5), (2,3,4)}

P’CG = Update (PCG , PC , P’G , tid)PCG = {(1,5), (2), (3), (4)} tid = 2

1) Remove tuple from the old group:2) Add tuple to the new group:

P’CG = {(1,5), (2), (3), (4)}P’CG = {(1,5), (3), (4)}P’CG = {(1,5), (3), (4)}P’CG = {(1,5), (2, 4), (3)}

Page 30: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation:User Study on Schema Operations Cont’d Task:

move an attribute across relations in a gene database (the same as before).

Measure: time to complete the task.

Compare CRIUS with a strawman system with only nested relational operators.

Page 31: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation:Performance of IFDI Task:

Re-generate the minimal FDs on value update.

Measure: The time to complete the task.

Compare IFDI with the naive algorithm.

a five-column table with varying row size

a ten-thousand-row table with varying column size.

Page 32: CRIUS: User-Friendly Database Design Li (Eric) Qian, Kristen LeFevre, H. V. Jagadish University of Michigan, Ann Arbor

Evaluation:Performance of Vertical Storage Tasks:

Execute an schema update. Load data from the relational back-end and construct a span table.

Measure: Time to complete each task.

Compare CRIUS with the naive storage

MB

ms

Time to move an attribute with varying DB size. Time to display data with varying DB size.