lecture 8 gooble bigtable

Google’s BigTable

Ghulam Yasin

• Development began in 2004 at Google (published 2006)

• A need to store/handle large amounts of (semi)-structured data

BigTable Introduction

• Many Google projects store data in BigTable

• Asynchronous processing across continuously evolving data• Petabytes in size

• High volume of concurrent reading/writing spanning many CPUs

• Need ability to conduct analysis across many subsets of data• Temporal analysis

• Can work well with many clients, but not too specific to clients’ needs

Goals of BigTable

BigTable in a Nutshell• Distributed multi-level map• Fault-tolerant• Scalable

– Thousands of servers– Terabytes of memory-based data– Petabytes of disk-based data– Millions of reads/writes per second

• Self-managing– Dynamic server management

Building Blocks• Google File System is used for BigTable’s

storage• Scheduler assigns jobs across many CPUs and

watches for failures• Lock service distributed lock manager• MapReduce is often used to read/write data to

BigTable– BigTable can be an input or output

Data Model• “Semi” Three Dimensional data cube

– Input(row, column, timestamp) Output(cell contents)

html…at t1

Columns

“com.cnn.www”

“contents:”

More on Rows and Columns

Rows• Name is an arbitrary string• Are created as needed when new data is written that has

no preexisting row • Ordered lexicographically so related data is stored on

one or a small number of machines

Columns• Columns have two-level name structure

– family:optional_qualifier (e.g. anchor:cnnsi.com | anchor: stanford.edu)

• More flexibility in dimensions– Can be grouped by locality groups that are relevant to client

Tablets• The entire BigTable is split into tablets of

contiguous ranges of rows– Approximately 100MB to

200MB each

• One machine services 100 tablets– Fast recovery in event of tablet failure– Fine-grained load balancing – 100 tablets are assigned non-deterministically to

avoid hot spots of data being located on one machine

• Tablets are split as their size grows

Tablet1

Tablet2

Implementation Structure

Master

Tablet serverTablet server Tablet server

GFSCluster scheduling system Lock service

-Metadata operations-Load balancing

-Serves data -Serves data -Serves data

-Handles failover and monitoring -Tablet data -Holds metadata-Master election

Client/API-Lock service: Open()-Tablets server: Read() and Write()-Master: CreateTable() and DeleteTable()

Locating Tablets• Metadata for tablet locations and start/end row

are stored in a special Bigtable cell

-Stored in lock service-Pointer to root

-Map of rows in second level of metadata

-Metadata for actual tablets-Pointers to each tablet

-Tablets

Reading/Writing to TabletsWrite commands

– First write command gets put into a queue/log for commands on that tablet

– Data is written to GFS and when this write command is committed, queue is updated• Mirror this write on the tablet’s buffer memory

Read commands– Must combine the buffered commands not yet

committed with the data in GFS

API• Metadata operations

– Create and delete tables, column families, change metadata

• Writes (atomic)– Set(): write cells in a row– DeleteCells(): delete cells in a row– DeleteRow(): delete all cells in a row

• Reads– Scanner: read arbitrary cells in BigTable

• Each row read is atomic• Can restrict returned rows to a particular range• Can ask for just data from one row, all rows, a subset of rows, etc.• Can ask for all columns, just certainly column families, or specific columns

Shared Logging• Logs are kept on a per tablet level

– Inefficient keep separate log files for each tablet tablet (100 tablets per server)

– Logs are kept in 64MB chunks

• Problem: Recovery in machine failure becomes complicated because many new machines are all reading killed machine’s logs– Many I/O operations

• Solved by master chunking killed machine’s log file for each new machine

Compression• Low CPU cost compression techniques are adopted• Complete across each SSTable for a locality group

– Used BMDiff and Zippy building blocks of compression

• Keys: sorted strings of (Row, Column, Timestamp)• Values

– Grouped by type/column family name– BMDiff across all values in one family

• Zippy as final pass over a whole block– Catches more localized repetitions– Also catches cross-column family repetition

• Compression at a factor of 10 from empirical results

lecture 8 gooble bigtable

new data

tablets metadata

tablet tablet

data model

readwrite data

related data

bigtable bigtable

machine tablets

Technology

google file system bigtable

bigtable raw

great bigtable and my toys

mapreduce & bigtable

google bigtable

bigtable: a distributed storage system for structured...

bigtable a distributed storage system

bigtable search presentation to austin pug

key-value stores...

bigtable: a distributed storage system for structured...

google - bigtable

google bigtable - universitetet i oslodata model a cluster...

bigtable: a system for distributed structured...

lecture 8 1 - peeebpeeeb.dk/book1/chapter8/lecture8.pdf ·...

google bigtable paper presentation

bigtable - massachusetts institute of...

hbase and bigtable storage

google cloud bigtable · 2016/02/03 · google research...

hbaseconeast2016: opentsdb+bigtable

구글을 지탱하는 기술 요약 - bigtable