google inc. - university of waterloo · 2010-02-01 · bigtable is a distributed storage system for...
TRANSCRIPT
![Page 1: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/1.jpg)
Google, Inc.
Presented by: Cătălin-Alexandru Avram
February 1st 2010
Fay Chang Jeffrey Dean Sanjay GhemawatWilson C. Hsieh Deborah A. WallachMike Burrows Tushar ChandraAndrew Fikes Robert E. Gruber
![Page 2: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/2.jpg)
Introduction
The Data Model
Building Blocks
Implementation
Performance Evaluation
Conclussion
Questions & Discussions
![Page 3: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/3.jpg)
Bigtable is a distributed storage system for managing structured data
It is extremely scalable (it can work with petabytes worth of data on thousands of machines)
It is actively used by over 60 Google products with workloads ranging from batch processing to live data serving
![Page 4: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/4.jpg)
Bigtable is a sparse, distributed, persistent, multidimensional sorted map
(row:string, column:string, time:int64) -> stringRows are ordered lexicographically and
grouped together in tabletsColumns are grouped in column familiesEach cell may contain multiple versions of the
same data – timestamped with either the real time or client generated timestamps
![Page 5: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/5.jpg)
“contents:” “anchor:politics.net”
com.cnn.politics↓ ↓
“<html>…” ←t4 “CNN news” ←t3
“<html>…” ←t3“<html>…” ←t5
“contents:” “anchor:cnnsi.com” “anchor:my.look.ca”
com.cnn.www↓ ↓ ↓
“<html>…” ←t6 “CNN” ←t9 “CNN.com” ←t8
Atomic row operationsColumn family level access control and garbage
collection settings
![Page 6: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/6.jpg)
com.cnn.politics
com.cnn.www
…
…
![Page 7: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/7.jpg)
An API is provided to handle relevant Bigtablefunctions
Regular expressions for lookups
Single-row transactions are supported
Execution of client Sawzall scripts in the server’s address space
Wrappers are provided to allow Bigtable to act as an input source or output target to MapReduce
![Page 8: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/8.jpg)
An SSTable provides a persistent ordered immutable map from keys to values (strings)
Split into blocks (typically 64KB in size)
A block index is stored at the end of the SSTable
The block index is loaded into memory when the SSTable is open
Optionally the entire SSTable may be loaded in memory
![Page 9: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/9.jpg)
Client
Direct client access
Client library
Master Dynamically added or removed
Assign tablets to tablet servers
Detect addition/expiration of tablet servers
Tablet server load balancingGFS file garbage collectionSchema changes
![Page 10: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/10.jpg)
The root tablet is never splitAll metadata tablets are stored in memory128 MB / tablet is sufficient to address 234 tabletsThe client library caches tablet location
![Page 11: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/11.jpg)
Tablet servers each acquire a lock on a Chubby file, allowing the master to keep track of them
If the file no longer exists the server kills itself
The master assigns tablets to tablet servers (the list of all tablets is kept in the METADATA table)
The master handles tabletcreation, deletion and merging
The tablet servers handle tabletsplitting
![Page 12: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/12.jpg)
When tablets become too large (typically 100-200 MB, tablet servers will split the tablet into 2 parts
The process involves creating a new tablet and committing the operation by adding the new information in the METADATA table
The master is notified after the commit
If the notification is lost, the master will be informed when it asks a tablet server to load the initial tablet -> the tablet server will only see part of the tablet it was asked to load when querying the METADATA table
![Page 13: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/13.jpg)
A tablet comprises of a list of immutable SSTablesstored under GFS
Recently committed operations are stored in memoryin so called memtables
Commit logs are kept to ensure recovery from failure“redo points” stored in the METADATA table are just
pointers to entries in these commit logsMemtables are compacted into SSTables once they
reach a certain size (minor compaction)Multiple SSTables are compacted together to speed
up read operations (major/merging compaction)
![Page 14: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/14.jpg)
Write Op
Read Opmemtable
Memory
GFS
tablet log
SSTable Files
![Page 15: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/15.jpg)
Locality groups (for column families)
Compression (block level)
Caching (Scan Cache and Block Cache)
Bloom filters
Tablet server level commit-log
Minor compactions before tablet movement
Exploiting immutability
![Page 16: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/16.jpg)
ExperimentNumber of Tablet Servers
1 50 250 500Random reads 1,212 593 479 241Random reads (mem) 10,811 8,511 8,000 6,250Random writes 8,850 3,745 3,425 2,000Sequential reads 4,425 2,463 2,625 2,469Sequential writes 8,547 3,623 2,451 1,905Scans 15,385 10,526 9,524 7,843
![Page 17: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/17.jpg)
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
0 100 200 300 400 500
Random reads Random reads (mem) Random writes Sequential reads Sequential writes Scans
![Page 18: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/18.jpg)
Bigtable provides an unconventional alternative to distributed databases
It offers great scalability and performance
Users have increased flexibility, but intelligent schema designs are required in order to maintain performance at high levels
It plays a pivotal role in Google’s infrastructure, being used by over 60 deployed products
![Page 19: Google Inc. - University of Waterloo · 2010-02-01 · Bigtable is a distributed storage system for managing structured data It is extremely scalable (it can work with petabytes worth](https://reader033.vdocuments.site/reader033/viewer/2022041521/5e2e86c01bc72436f02d631a/html5/thumbnails/19.jpg)
How big of a problem is the lack of general transaction support ?
Is the performance of reading operations too low (especially if the data accessed is relatively new) ?
The system performs well when faced with Google’s application needs; will it fare as well in other types of applications ?
Comparison with standard distributed database systems.