dremel: interactive analysis of web-scale datasets · map reduce v.s. dremel: sidenote • dremel...
TRANSCRIPT
![Page 1: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/1.jpg)
Dremel: Interactive Analysis of Web-Scale Datasets
Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis
Presenter: MoHan Zhang
*Some images in the presentation are taken from slides made by the original authors.
![Page 2: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/2.jpg)
Outline• Introduction • Nested Columnar Storage • Query Processing • Experiments and Observations
![Page 3: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/3.jpg)
Outline• Introduction • Nested Columnar Storage • Query Processing • Experiments and Observations
![Page 4: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/4.jpg)
What is Dremel?
A brand of rotary tools used in the metalworking industry, primarily relying on their speed as opposed to torque…
![Page 5: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/5.jpg)
Dremel is a Scalable, Interactive ad-hoc query system for analysis of large-scale read-only nested data
• Developed and used by Google since 2006
![Page 6: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/6.jpg)
Key Ideas• Focuses on achieving interactive speed for very large datasets
• Multi-Terabyte data, scales to 1000s of nodes
• Uses nested data model with SQL-like language
• Columnar storage format
• Employs tree architecture for query processing
![Page 7: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/7.jpg)
Uses inside Google• Analysis of crawled web documents. • Tracking install data for applications on Android Market.
• Crash reporting for Google products. • OCR results from Google Books. • Spam analysis. • Debugging of map tiles on Google Maps.
• Tablet migrations in managed Bigtable instances. • Results of tests run on Google’s distributed build system.
• Disk I/O statistics for hundreds of thousands of disks. • Resource monitoring for jobs run in Google’s data centers.
• Symbols and dependencies in Google’s codebase.
![Page 8: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/8.jpg)
Sample Workflow• Data engineer runs a Map Reduce to find signals from web
pages, returning billions of records
• The engineer launches Dremel and runs interactive commands
DEFINE TABLE t AS /path/to/data/*
SELECT TOP(signal1, 100), COUNT(*) FROM t
• More MR-based processing of the data
![Page 9: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/9.jpg)
Outline• Introduction • Nested Columnar Storage • Query Processing • Experiments and Observations
![Page 10: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/10.jpg)
Record vs. Columnar RepresentationA
BC D
E*
*
*
. . .
. . .r1
r2 r1r2
r1
r2
r1
r2
Challenges: • Lossless representation of nested record structure • Reconstruct original structure from a subset of fields
![Page 11: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/11.jpg)
Sample Nested Data Model
message Document { required int64 DocId; [1,1] optional group Links { repeated int64 Backward; [0,*] repeated int64 Forward; } repeated group Name { repeated group Language { required string Code; optional string Country; [0,1] } optional string Url; } }
DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us' Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'
DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C'
r2
multiplicity:
![Page 12: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/12.jpg)
Column-Striped Representation
value r d10 0 020 0 0
DocIdvalue r d
http://A 0 2http://B 1 2NULL 1 1
http://C 0 2
Name.Url
value r den-us 0 2
en 2 2NULL 1 1en-gb 1 2
NULL 0 1
Name.Language.Code Name.Language.Country
Links.BackwardLinks.Forward
value r dus 0 3
NULL 2 2NULL 1 1
gb 1 3
NULL 0 1
value r d20 0 240 1 260 1 280 0 2
value r dNULL 0 1
10 0 230 1 2
Each column stored as set of blocks
![Page 13: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/13.jpg)
Repetition & Definition Levels•Repetition Level:•at what repeated field in the field’s path the value has repeated
•Definition Levels:•how many fields that could be undefined (optional/repeated) that are actually present in the record
![Page 14: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/14.jpg)
14
DocId: 10 Links Forward: 20 Forward: 40 Forward: 60 Name Language Code: 'en-us' Country: 'us' Language Code: 'en' Url: 'http://A' Name Url: 'http://B' Name Language Code: 'en-gb' Country: 'gb'
DocId: 20 Links Backward: 10 Backward: 30 Forward: 80 Name Url: 'http://C'
r2
value r den-us 0 2
en 2 2NULL 1 1en-gb 1 2
NULL 0 1
Name.Language.Code
r: At what repeated field in the field’s path the value has repeated
d: How many fields that could be undefined (opt. or rep.) are actually present
record (r=0) has repeatedr=2r=1
Language (r=2) has repeated
(non-repeating)
no value: Name (r=1) has repeated,
Name (d=1) is defined
no value: record (r=0) has repeated,
Name is defined (d=1)
Repetition & Definition Levels
![Page 15: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/15.jpg)
Record Assembly•Goal: Given subset of fields, reconstruct the original records as if they only contained the selected fields
•Finite State Machine reads the field values and levels for each field and appends the values sequentially to the output records Name.Language.CountryName.Language.Code
Links.Backward Links.Forward
Name.Url
DocId
1
0
10
0,1,2
2
0,11
0
0
Transitions labeled with repetition levels
![Page 16: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/16.jpg)
Record Assembly from Two Fields
DocId
Name.Language.Country1,2
0
0
DocId: 10 Name Language Country: 'us' Language Name Name Language Country: 'gb'
DocId: 20 Name
s1
s2
Preserves structure of the parent fields
![Page 17: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/17.jpg)
Outline• Introduction • Nested Columnar Storage • Query Processing • Experiments and Observations
![Page 18: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/18.jpg)
Sample Query
Id: 10 Name Cnt: 2 Language Str: 'http://A,en-us' Str: 'http://A,en' Name Cnt: 0
t1
SELECT DocId AS Id, COUNT(Name.Language.Code) WITHIN Name AS Cnt, Name.Url + ',' + Name.Language.Code AS Str FROM t WHERE REGEXP(Name.Url, '^http') AND DocId < 20;
message QueryResult { required int64 Id; repeated group Name { optional uint64 Cnt; repeated group Language { optional string Str; } } }
Output table Output schema
![Page 19: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/19.jpg)
Serving Tree Architecture
storage layer
. . .
. . .. . .leaf servers
(with local storage)
intermediate servers
root server
client
•Root server: receives incoming queries, reads metadata from tables, and routes queries to the next level
•Intermediate server: parallel aggregation of partial results
•Leaf server: communicate with storage layer / access the data on local disk
![Page 20: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/20.jpg)
Serving Tree• Designed for aggregate queries returning small~medium results (<
1M), larger aggregations rely on parallel DBMS and Map Reduce
• Query Dispatcher provides scheduling and fault tolerance • schedules queries based on their priorities and balances the
load • If one node becomes much slower, reschedule
• Some Dremel queries return approximate results (e.g. top-k, join)
![Page 21: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/21.jpg)
Outline• Introduction • Nested Columnar Storage • Query Processing • Experiments and Observations
![Page 22: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/22.jpg)
Record v.s. Columns
0 2 4 6 8
10 12 14 16 18 20
1 2 3 4 5 6 7 8 9 10
columnsrecords
objectsfro
m re
cord
sfro
m c
olum
ns
(a) read + decompress
(b) assemble records
(c) parse as C++ objects
(d) read + decompress
(e) parse as C++ objects
time (sec)
number of fields
Tablet: 375 MB (compressed), 300K rows, 125 columns
![Page 23: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/23.jpg)
Record v.s. Columns: Takeaways• For columnar storage, the most significant performance gain occurs
when few fields (columns) are read
• Record assembly and parsing are expensive
• Even when we need records, it is still better to store data in columnar format
• Record-based storage gradually start to outperform Columnar storage if more fields are read
![Page 24: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/24.jpg)
Map Reduce v.s. Dremel
Execution time (sec) on 3000 nodes, 85 billion records
![Page 25: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/25.jpg)
Map Reduce v.s. Dremel: Sidenote• Dremel is not designed to replace Map Reduce. Rather, it is
used in conjunction with Map Reduce.
• Map Reduce is a generic software framework designed to tackle distributed computational problems for large data
• Dremel is a data analysis tool that runs almost realtime
• The two were designed with different purposes.
![Page 26: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/26.jpg)
Map Reduce v.s. Dremel: Sidenote• Why do we need Dremel? Why not just Map Reduce?
• Map Reduce and the other frameworks built on top of it (e.g. Hive, Pig) have a latency between running the job and getting the answer. In other words, they are not realtime.
• Dremel complements that weakness.
![Page 27: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/27.jpg)
Scalability
0 50
100 150 200 250
1000 2000 3000 4000
execution time (sec)
number of leaf servers
![Page 28: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/28.jpg)
Observations•Dremel scans quadrillions of records per month
•Most queries are processed under 10 seconds
•Map Reduce can benefit from Columnar Storage just like a DBMS
•Parallel DBMS can benefit from serving tree architecture just like search engines
•Possible to analyze large disk-resident datasets interactively on basic hardware•1T records, thousands of nodes
![Page 29: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/29.jpg)
Recap
![Page 30: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/30.jpg)
Dremel• A distributed system for interactive analysis of large datasets
• Thousands of nodes, Petabytes of data • Returns answers in seconds • Read-only data
• Nested data model • Thousands of fields, deeply nested
• Columnar storage • Much faster than record-oriented storage in reading time • Lossless representation of record structure
• Serving tree architecture • Aggregation of results and query scheduling in parallel
![Page 31: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/31.jpg)
Thank you!
![Page 32: Dremel: Interactive Analysis of Web-Scale Datasets · Map Reduce v.s. Dremel: Sidenote • Dremel is not designed to replace Map Reduce. Rather, it is used in conjunction with Map](https://reader030.vdocuments.site/reader030/viewer/2022040320/5e48ca41f2f06f78dc70dfe4/html5/thumbnails/32.jpg)
Q&A