cs227 li jin, yang lu brown universitycs.brown.edu/courses/csci2270/archives/2011/slides/... ·...

26
Hypertable CS227 Li Jin, Yang Lu Brown University

Upload: others

Post on 25-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Hypertable

CS227 Li Jin, Yang Lu

Brown University

Page 2: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Schedule

Overview

Data Model

Architecture

Case Study

Page 3: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

High Performance, Open Source

Scalable Database

Modeled after Bigtable

Implemented in C++

Project Started in March 2007

Runs on top of HDFS

Thrift Interface for all popular languages Java

PHP

Ruby

Python

Perl, etc.

Page 4: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Hypertable Deployments

Page 5: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Data Model

Page 6: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

www.hypertable.org

Table: Visual Representation

Page 7: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Table: Key

Page 8: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

www.hypertable.org

Table: Actual Representation

Page 9: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Table: Physical Data Layout

Page 10: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Access Groups

Provides control over physical layout Row oriented

Column oriented

Hybrid

Reduces I/O

CREATE TABLE MyTable (

a, b, c, d,

ACCESS GROUP first(a),

ACCESS GROUP second (b, c, d)

);

Page 11: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Scaling (part I)

Page 12: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Scaling (part II)

Page 13: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Scaling (part III)

Page 14: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

System Components

Page 15: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Query Handling: Write

Hypertable Bigtable

CellStore SSTable

CellCache Memtable

Heap

Insert log(n)

lookup log(n)

merge log(n)

Page 16: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Query Handling: Read

Page 17: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Transaction Support

Single-row transactions

Entire row is guaranteed to be in a single

range.

Transactions with the data scattered over

multiple RangeServers is not yet

implemented.

Maybe not necessary for a OLAP system.

Page 18: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Hyperspace

Metadata 0 Root

Row key Location

1:com.yahoo.www 192.168.79.5

2: us-ri-02912.www 192.168.79.6

Metadata 1 Tablet

Row key Location Files

1:com.facebook.www 192.168.79.5

1:com.yahoo.www 192.168.79.6

Metadata 1 Tablet

Row key Location Files

2:us-ri-02906 192.168.79.5

2:us-ri-02912 192.168.79.6

Access Group Files

Content; Title /hypertable/data/CellStore1

Anchor /hypertable/data/CellStore1

Indexing

Page 19: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Client Access

Thrift API:

supports C++, Java, PHP, Python, Perl, Ruby

get_cells(), set_cells()…

HQL:

select, insert, delete..

Page 20: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Case Study: Tribalytic

A market research tool using Twitter data.

Example: Given a keyword “coffee

machine”, it tells you people usually tweets

about coffee machine around 9:00 14:00

and 21:00.

Page 21: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Case Study: Tribalytic

One single table called “hits”.

Use keywords as row key.

All queries search on row key.

Page 22: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Case Study: Tribalytic

Process 3.5 million tweets a day.

Push 50 million records to “hits” table a

day.

Less than 250 ms to locate and load

40,000 record into memory. (In the coffee

machine example).

Page 23: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Case Study: Tribalytic

Row Key Tweets:

Li

Tweets:

Yang

Tweets:

Tweets:

Coffee machine … … … …

Page 24: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Case Study: Tribalytic

Tweet Id User Timestamp Content

Keyword List(Tweet Id)

OR

Keyword List(User) List(Timestamp) List(Content)

OR

Keyword List(Tweet Object)

Page 25: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Best Fit Use Cases

Serve massive data sets to live

applications.

Good for applications that need to scan

over data ranges.

Page 26: CS227 Li Jin, Yang Lu Brown Universitycs.brown.edu/courses/csci2270/archives/2011/slides/... · High Performance, Open Source Scalable Database Modeled after Bigtable Implemented

Q&A