Download - Accumulo design
![Page 1: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/1.jpg)
APACHE ACCUMULOFrom a design perspective
![Page 2: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/2.jpg)
SCALABLE KEY-VALUE STORE BASED ON GOOGLE'S
BIGTABLE
![Page 3: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/3.jpg)
BIGTABLE FEATURES• Distributes data across many commodity servers
• Sorts data by key for fast lookup of values by key
• Scan across multiple key value pairs
• Highly consistent writes to single row
• Support for MapReduce jobs
![Page 4: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/4.jpg)
DATA MODEL
Key
ValueRow ID
ColumnTimestamp
Family Qualifier
![Page 5: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/5.jpg)
Row ID Col Fam Col Qual Timestamp Value
Bob Email id0023 20120301 Hey joe, can you send ...
Bob Email id0024 20120302 Re: next Thursday ...
Bob UserPrefs Background 20130101 Grey
Fred Email id0001 20080302 Welcome to gmail ...
Sarah Email id0004 20130201 Hi again ...
Sara Videos ytid009 20100303 nsu736:)jdudjdk$:)378;'$$)
![Page 6: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/6.jpg)
Tablet servers HDFS DataNodesCommit Layer Replication Layer
![Page 7: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/7.jpg)
SINCE 2006• Several BigTable implementations
• Apache Hbase
• Apache Cassandra
• Apache Accumulo
• others …
![Page 8: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/8.jpg)
BIGTABLE IS BIGTABLE RIGHT?
![Page 9: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/9.jpg)
HBASE
![Page 10: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/10.jpg)
HBASE• Open source Apache project started by developers at
Powerset, bought by Microsoft
• Now used at Facebook, StumbleUpon, other big web sites
• Fast reads
• Row-oriented API
• Each column family has it's own set of files
![Page 11: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/11.jpg)
CASSANDRA
![Page 12: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/12.jpg)
CASSANDRA• Apache project started at Facebook
• Combines elements of BigTable and Amazon's Dynamo into one system
• Used at Netflix, other web sites
• Fast writes
• Tunable consistency
![Page 13: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/13.jpg)
Tablet serversCommit and Replication Layer
![Page 14: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/14.jpg)
CONSISTENCY
• Highly consistent means: writes in one place
• Eventually consistent: writes in > one place
• Writes in > one place: network partition tolerance
• Partition tolerance: geographically distributed servers
• *Google uses Spanner to synchronize multiple dbs
![Page 15: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/15.jpg)
Tablet serversData Center A Data Center B
![Page 16: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/16.jpg)
Data Center A Data Center BTablet servers
![Page 17: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/17.jpg)
OVERVIEW
• Both highly scalable
• Used to build web applications that can serve millions of users at once
• Serves as a low-latency persistence layer for real time service of requests
• Available in single data center or cross data center options
![Page 18: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/18.jpg)
USE CASE
• Most data comes from users
• Schema defined by the application
• Data builds up over time
![Page 19: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/19.jpg)
Many UsersDbWeb
application
![Page 20: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/20.jpg)
ACCUMULO
![Page 21: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/21.jpg)
ACCUMULO
• Can support the web application use-case
• But what are those other extra features for?
![Page 22: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/22.jpg)
ACCUMULO ‘EXTRAS’• Dynamic Column Families
• Column Visibility
• Key-value oriented API
• Iterators
• Batch Scanners
![Page 23: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/23.jpg)
BIG ORGANIZATIONS
• Missions other than internet services
• Various disparate operational systems that generate data
• Desire to look across and analyze that data
• Desire to deliver results to their own population
![Page 24: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/24.jpg)
USE CASE IS DISCOVERING AND ANALYZING ALL DATA
![Page 25: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/25.jpg)
ISSUES
• Scale
• Unknown / multiple schema
• Support for analysis without data movement
• Varying levels of sensitivity in the same system
• Support a high number of low-latency user requests
![Page 26: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/26.jpg)
Many Users
Analyze
Db
Data sets
![Page 27: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/27.jpg)
SCALE?
![Page 28: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/28.jpg)
CHECK (IT’S BIGTABLE)
![Page 29: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/29.jpg)
NO CONTROL OVER OR MANY DIFFERENT SCHEMA?
![Page 30: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/30.jpg)
MAP EXISTING FIELDS TO COLUMNS DYNAMICALLY
![Page 31: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/31.jpg)
INCLUDING COLUMN FAMILIES
![Page 32: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/32.jpg)
VARYING LEVELS OF DATA SENSITIVITY?
![Page 33: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/33.jpg)
COLUMN VISIBILITY
![Page 34: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/34.jpg)
DATA MODEL
Key
ValueRow ID
ColumnTime
stampFamily Qualifier Visibility
![Page 35: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/35.jpg)
Row ID Col Fam Col Qual Col Vis Timestamp Value
Bob Email id0023 personal comms 20120301 Hey joe, can
you send ...
Bob Email id0024 personal comms 20120302 Re: next
Thursday ...
Bob UserPrefs Background prefs 20130101 Grey
Fred Email id0001 personal comms 20080302 Welcome to
gmail ...
Sarah Email id0004 personal comms 20130201 Hi again ...
Sara Videos ytid009 public post 20100303nsu736:)jdu
djdk$:)378;'$$)
![Page 36: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/36.jpg)
DATA OF VARYING SENSITIVITY LEVELS CAN BE PHYSICALLY CO-LOCATED
![Page 37: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/37.jpg)
FRAMEWORKS LIKE HADOOP MAP REDUCE LOVE IT WHEN
DATA IS ALL TOGETHER
![Page 38: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/38.jpg)
LOOK ACROSS DATASETS?
![Page 39: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/39.jpg)
SECONDARY INDICES
![Page 40: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/40.jpg)
SECONDARY INDICES
• Application-created data: known
• Pre-existing data? unknown
![Page 41: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/41.jpg)
DATA DISCOVERY!
![Page 42: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/42.jpg)
SECONDARY INDICESRowID Col Qual Value
RID00001 age 54
RID00001 name bob
RID00002 name fred
RID00003 age 43
RID00003 height 5’9”
RID00003 name harry
RID00004 name carl
RID00005 name evan
RowID Col Fam Col Qual
43 age RID00003
54 age RID00001
5’9” height RID00003
bob name RID00001
carl name RID00004
evan name RID00005
fred name RID00002
harry name RID00003
![Page 43: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/43.jpg)
PARTIAL ROW SCANS
![Page 44: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/44.jpg)
BATCH SCANNERS
![Page 45: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/45.jpg)
RowID Col Qual Value
RID00001 age 54
RID00001 name bob
RID00002 name fred
RID00003 age 43
RID00003 height 5’9”
RID00003 name harry
RID00004 name carl
RID00005 name evan
Batch Scanner
![Page 46: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/46.jpg)
COLUMN VISIBILITY APPLIES TO INDEXES TOO
![Page 47: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/47.jpg)
ANALYSIS?
![Page 48: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/48.jpg)
MAPREDUCE: CHECK
![Page 49: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/49.jpg)
SHUFFLE-SORTED?
• Between Map and Reduce phases is shuffle-sort
• Sorting by key is necessary so all the values for a given key end up next to each other …
• BigTable also sorts keys …
![Page 50: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/50.jpg)
ITERATORS
![Page 51: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/51.jpg)
Value combine(Iterator<Value> values)
![Page 52: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/52.jpg)
PRE-COMPUTATION
![Page 53: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/53.jpg)
Many Users
Analyze
Db
Data sets
![Page 54: Accumulo design](https://reader033.vdocuments.site/reader033/viewer/2022051817/547ea7f1b4af9f0e498b4576/html5/thumbnails/54.jpg)
ACCUMULO