low latency “olap” with hbase - hbasecon 2012
DESCRIPTION
TRANSCRIPT
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Cosmin Lehene | AdobeLow Latency “OLAP” with HBase
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What we needed … and built
OLAP Semantics Low Latency Ingestion High Throughput Real-time Query API
Not hardcoded to web analytics or x-, y-, z- analytics, but extensible
2
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Building Blocks
Dimensions, Metrics Aggregations Roll-up, drill-down, slicing and dicing, sorting
3
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 – Queries example
4
Date Country
City OS Browser Sale
2012-05-21
USA NY Windows FF 0.0
2012-05-21
USA NY Windows FF 10.0
2012-05-22
USA SF OSX Chrome 25.0
2012-05-22
Canada Ontario Linux Chrome 0.0
2012-05-23
USA Chicago OSX Safari 15.0
5 visits,3 days
2 countriesUSA: 4Canada: 1
4 cities:NY: 2SF: 1
3 OS-esWin: 2OSX: 2
3 browsersFF: 2Chrome:2
50.03 sales
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 – Queries example
Rolling up to country level:
SELECT COUNT(visits), SUM(sales)
GROUP BY country
“Slicing” by browser
SELECT COUNT(visits), SUM(sales)
GROUP BY country
HAVING browser = “FF”
Top browsers by sales
SELECT SUM(sales), COUNT(visits)
GROUP BY browser
ORDER BY sales5
Country visits
sales
USA 4 $50
Canada 1 0
Country visits
sales
USA 2 $10
Canada 0 0
Browser sales visits
Chrome $25 2
Safari $15 1
FF $10 2
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Aggregate at runtime Most flexible
Fast – scatter gather
Space efficient
But I/O, CPU intensive
slow for larger data
low throughput
Pre-aggregate Fast
Efficient – O(1)
High throughput
But More effort to process
(latency)
Combinatorial explosion (space)
No flexibility
OLAP – Runtime Aggregation vs. Pre-aggregation
6
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Pre-aggregation
Data needs to be summarized
Can’t visualize 1B data points (no, not even with Retina display)
Difficult to comprehend correlations among more than 3 dimensions
Not all dimension groups are relevant
Index on a needed basis (view selection problem)
Runtime aggregation == TeraSort for every query?
Pre-aggregate to reduce cardinality
7
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase
We tune both
pre-aggregation level vs. runtime post-aggregation
(ingestion speed + space ) vs. (query speed)
Think materialized views from RDBMS
8
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase Domain Model Mapping
9
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Domain Model Mapping
10
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Ingestion, Processing, Indexing, Querying
11
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase - Ingestion, Processing, Indexing, Querying
12
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Ingestion
13
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Ingestion throughput vs. latency
Historical data (large batches) Optimize for throughput
Increments (latest data, smaller) Optimize for latency
14
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Large, granular input strategies
Slow listing in HDFS Archive processed files
Filtering input FileDateFilter (log name patterns: log-YYYY-MM-dd-HH.log)
TableInputFormat start/stop row
File Index in HBase (track processed/new files)
Map tasks overhead - stitching input splits 400K files => 400K map tasks => overhead, slow reduce copy
CombineFileInputFormat – 2GB-splits => 500 splits for 1TB
FixedMappersTableInputFormat (e.g. 5-region splits)15
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Ingestion – Bulk Import
HFileOutputFormat (HFOF)
100s X faster than HBase API
No need to recover from failed jobs
No unnecessary load on machines
* No shuffle - global reduce order required!
e.g. first reduce key needs to be in the first region, last one in the last region
Watch for uneven partitions
16
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
1 partition(reduce) / day for initial import
Uneven reduce (partitions) due to data growth over time Reduce k: 2010-12-04 = 500MB
Reduce n: 2012-05-22 = 5GB => slow and will result in a 5GB region
Balance reduce buckets based on input file sizes and the reduce key
Generate sub-partitions based on predefined size (e.g. 1GB)
HFOF – FileSizeDatePartitioner
17
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing
18
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing
Processing involves reading the Input (files, tables, events), pre-aggregating it (reducing cardinality) and generating tables that can be queried in real-time
1 year: 1B events => 100B data points indexed
Query => scan 365 data points (e.g. daily page views)
Processing could be either MR or real-time (e.g. Storm)
19
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing for OLAP semantics
GROUP BY (process, query)
COUNT, SUM, AVG, etc. (process, query)
SORT (process, query)
HAVING (mostly query, can define pre-process constraints)
20
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
SaasBase vs. SQL Views Comparison
21
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
reports.json entities definition
22
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing Performance
read, map, partition, combine, copy, sort, reduce, write
Read:
Scan.setCaching() (I/O ~ buffer)
Scan.setBatching() (avoid timeouts for abnormal input, e.g. 1M hits/visit)
Even region distribution across cluster (distributes CPU, I/O)
Map:
No unnecessary transformations: Bytes.toString(bytes) + Bytes.toBytes(string) (CPU)
Avoid GC : new X() (CPU, Memory)
Avoid system calls (context switching)
Stripping unnecessary data (I/O)
23
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Processing Performance
Hot (in memory) vs. Cold (on disk, on network) data
Minimize I/O from disk/network
Single shot MR job: SuperProcessor
Emit all groups from one map() call
Incremental processing
Data format YYYY-MM-DD prefixed rowkey (HH:mm for more granularity)
24
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
Indexing
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
HBase natural order: hierarchical representation
26
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Indexing - Why
Example: top 10 cities ~50K [country, city] combinations per day
Top 10 cities for 1 year =>
365 (days) X 50K ~=15M data points scanned
If you add gender => 30M
If you add Device, OS, Browser …
Might compress well, but think about the environment
How much energy would you spend for just top 10 cities?
* Image from: http://my.neutralexistence.com/images/Green-Earth.jpg
27
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Indexing with HBase “10” < “2”
GROUP BY year, month, country, city ORDER BY visits DESC LIMIT 10
Lexicographic sorting
2012/05/USA/0000000000/
2012/05/USA/4294961296/San Francisco = 1000 visits*
2012/05/USA/4294961396/New York = 900 visits*
. . .
2012/05/USA/9999999999/
scan “t” startrow => “2012/05/USA/”, limit => 10
* Padding numbers for lexicographic sorting:
1000 -> Long.MAX_VALUE – 1000 = 4294961296
28
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Query Engine
Always reads indexed, compact data
Query parsing
Scan strategy
Single vs. multiple scans
Start/stop rows (prefixes, index positions, etc.)
Index selection (volatile indexes with incremental processing)
Deserialization
Post-aggregation, sorting, fuzzy-sorting etc.
Paging
Custom dimension/metric class loading
29
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Conclusions
OLAP semantics on a simple data model
Data as first class citizen
Domain Specific “Language” for Dimensions, Metrics, Aggregations
Tunable performance, resource allocation
Framework for vertical analytics systems
30
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Thank you!Cosmin Lehene @clehene
http://hstack.orgCredits:
Andrei Dragomir
Adrian Muraru
Andrei Dulvac
Raluca Podiuc
Tudor Scurtu
Bogdan Dragu
Bogdan Drutu
31
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 - Rollup
Rollup: SELECT COUNT(visits), SUM(sales) GROUP BY country
33
Country
Visits Sale
USA 4 $50
Canada 1 $0
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 - Slicing
Filter or Segment or Slice (WHERE or HAVING)
34
Date Country
City OS Browser Sale
2012-03-02
USA NY Windows FF 0.0
2012-03-02
USA NY Windows FF 10.0
2012-03-03
USA S OSX Chrome 25.0
2012-03-03
Canada Ontario Linux Chrome 0.0
2012-03-04
USA Chicago OSX Safari 15.0
5 visits,3 days
2 countriesUSA: 4Canada: 1
4 cities:NY: 2SF: 1
3 OS-esWin: 2OSX: 2
3 browsersFF: 2Chrome:2
50.03 sales
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
OLAP 101 – Sorting, TOP n
SELECT SUM(sales) as total GROUP BY browser ORDER BY total
35
Date Country
City OS Browser Sale
Chrome $25
Safari $15
Firefox $10