real-time analytics at facebook

33
Real-time Analytics at Facebook Zheng Shao 10/18/2011

Upload: rasul

Post on 24-Feb-2016

101 views

Category:

Documents


0 download

DESCRIPTION

Real-time Analytics at Facebook. Zheng Shao 10/18/2011. Agenda. Analytics and Real-time what and why. Facebook Insights. Use cases Websites/Ads/Apps/Pages Time series Demographic break-downs Unique counts/heavy hitters Major challenges Scalability Latency. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Real-time Analytics at Facebook

Real-time Analytics at Facebook

Zheng Shao10/18/2011

Page 2: Real-time Analytics at Facebook

1 Analytics and Real-time

2 Data Freeway

3 Puma

4 Future Works

Agenda

Page 3: Real-time Analytics at Facebook

Analytics and Real-timewhat and why

Page 4: Real-time Analytics at Facebook

Facebook Insights• Use cases▪ Websites/Ads/Apps/Pages▪ Time series▪ Demographic break-downs▪ Unique counts/heavy hitters

• Major challenges▪ Scalability▪ Latency

Page 5: Real-time Analytics at Facebook

Analytics based on Hadoop/Hive

• 3000-node Hadoop cluster• Copier/Loader: Map-Reduce hides machine failures• Pipeline Jobs: Hive allows SQL-like syntax• Good scalability, but poor latency! 24 – 48 hours.

Scribe NFSHTTP HiveHadoop

MySQL

seconds secondsHourly

Copier/LoaderDaily

Pipeline Jobs

Page 6: Real-time Analytics at Facebook

How to Get Lower Latency?

• Small-batch Processing▪ Run Map-reduce/Hive every hour,

every 15 min, every 5 min, …▪ How do we reduce per-batch

overhead?

• Stream Processing▪ Aggregate the data as soon as it

arrives▪ How to solve the reliability

problem?

Page 7: Real-time Analytics at Facebook

Decisions• Stream Processing wins!

• Data Freeway▪ Scalable Data Stream Framework

• Puma▪ Reliable Stream Aggregation Engine

Page 8: Real-time Analytics at Facebook

Data Freewayscalable data stream

Page 9: Real-time Analytics at Facebook

Scribe

• Simple push/RPC-based logging system• Open-sourced in 2008. 100 log categories at that

time.• Routing driven by static configuration.

ScribeClients

ScribeMid-Tier

ScribeWriters NFS

HDFS

Log Consumer

Batch Copier

tail/fopen

Page 10: Real-time Analytics at Facebook

• 9GB/sec at peak, 10 sec latency, 2500 log categories

Data Freeway

ScribeClients Calligraphus

Mid-tierCalligraphus

WritersHDFS

HDFS

C1

C1

C2

C2

DataNode

DataNode

PTail

Zookeeper

Log Consumer

Continuous Copier

PTail(in the plan)

Page 11: Real-time Analytics at Facebook

Calligraphus• RPC File System▪ Each log category is represented by 1 or more FS directories▪ Each directory is an ordered list of files

• Bucketing support▪ Application buckets are application-defined shards.▪ Infrastructure buckets allows log streams from x B/s to x GB/s

• Performance▪ Latency: Call sync every 7 seconds▪ Throughput: Easily saturate 1Gbit NIC

Page 12: Real-time Analytics at Facebook

Continuous Copier• File System File System• Low latency and smooth network usage• Deployment▪ Implemented as long-running map-only job▪ Can move to any simple job scheduler

• Coordination▪ Use lock files on HDFS for now▪ Plan to move to Zookeeper

Page 13: Real-time Analytics at Facebook

PTail

• File System Stream ( RPC )• Reliability▪ Checkpoints inserted into the data stream▪ Can roll back to tail from any data checkpoints▪ No data loss/duplicates

directory

files

directory

directory

checkpoint

Page 14: Real-time Analytics at Facebook

Channel ComparisonPush / RPC

Pull / FS

Latency 1-2 sec 10 secLoss/Dups Few NoneRobustness

Low High

Complexity

Low HighPush / RPC

Pull / FS

Scribe

CalligraphusPTail + ScribeSend

Continuous Copier

Page 15: Real-time Analytics at Facebook

Pumareal-time aggregation/storage

Page 16: Real-time Analytics at Facebook

Overview

• ~ 1M log lines per second, but light read• Multiple Group-By operations per log line• The first key in Group By is always time/date-

related• Complex aggregations: Unique user count, most

frequent elements

Log Stream AggregationsStorage Serving

Page 17: Real-time Analytics at Facebook

MySQL and HBase: one pageMySQL HBase

Parallel Manual sharding Automaticload balancing

Fail-over Manual master/slave switch

Automatic

Read efficiency High LowWrite efficiency Medium HighColumnar support

No Yes

Page 18: Real-time Analytics at Facebook

Puma2 Architecture

• PTail provide parallel data streams• For each log line, Puma2 issue “increment”

operations to HBase. Puma2 is symmetric (no sharding).

• HBase: single increment on multiple columns

PTail Puma2 HBase Serving

Page 19: Real-time Analytics at Facebook

Puma2: Pros and Cons• Pros▪ Puma2 code is very simple.▪ Puma2 service is very easy to maintain.

• Cons▪ “Increment” operation is expensive.▪ Do not support complex aggregations.▪ Hacky implementation of “most frequent elements”.▪ Can cause small data duplicates.

Page 20: Real-time Analytics at Facebook

Improvements in Puma2• Puma2▪ Batching of requests. Didn’t work well because of long-tail distribution.

• HBase▪ “Increment” operation optimized by reducing locks.▪ HBase region/HDFS file locality; short-circuited read.▪ Reliability improvements under high load.

• Still not good enough!

Page 21: Real-time Analytics at Facebook

Puma3 Architecture

• Puma3 is sharded by aggregation key.• Each shard is a hashmap in memory.• Each entry in hashmap is a pair of

an aggregation key and a user-defined aggregation.

• HBase as persistent key-value storage.

PTail Puma3 HBase

Serving

Page 22: Real-time Analytics at Facebook

Puma3 Architecture

• Write workflow▪ For each log line, extract the columns for key and value.▪ Look up in the hashmap and call user-defined aggregation

PTail Puma3 HBase

Serving

Page 23: Real-time Analytics at Facebook

Puma3 Architecture

• Checkpoint workflow▪ Every 5 min, save modified hashmap entries, PTail checkpoint to HBase

▪ On startup (after node failure), load from HBase▪ Get rid of items in memory once the time window has passed

PTail Puma3 HBase

Serving

Page 24: Real-time Analytics at Facebook

Puma3 Architecture

• Read workflow▪ Read uncommitted: directly serve from the in-memory hashmap; load from Hbase on miss.

▪ Read committed: read from HBase and serve.

PTail Puma3 HBase

Serving

Page 25: Real-time Analytics at Facebook

Puma3 Architecture

• Join▪ Static join table in HBase.▪ Distributed hash lookup in user-defined function (udf).▪ Local cache improves the throughput of the udf a lot.

PTail Puma3 HBase

Serving

Page 26: Real-time Analytics at Facebook

Puma2 / Puma3 comparison• Puma3 is much better in write throughput▪ Use 25% of the boxes to handle the same load.▪ HBase is really good at write throughput.

• Puma3 needs a lot of memory▪ Use 60GB of memory per box for the hashmap▪ SSD can scale to 10x per box.

Page 27: Real-time Analytics at Facebook

Puma3 Special Aggregations• Unique Counts Calculation▪ Adaptive sampling▪ Bloom filter (in the plan)

• Most frequent item (in the plan)▪ Lossy counting▪ Probabilistic lossy counting

Page 28: Real-time Analytics at Facebook

PQL – Puma Query Language• CREATE INPUT TABLE t

(‘time', ‘adid’, ‘userid’);

• CREATE VIEW v ASSELECT *, udf.age(userid)FROM tWHERE udf.age(userid) > 21

• CREATE HBASE TABLE h …

• CREATE LOGICAL TABLE l …

• CREATE AGGREGATION ‘abc’INSERT INTO l (a, b, c)SELECT udf.hour(time), adid, age, count(1), udf.count_distinc(userid)FROM vGROUP BY udf.hour(time), adid, age;

Page 29: Real-time Analytics at Facebook

Future Workschallenges and opportunities

Page 30: Real-time Analytics at Facebook

Future Works• Scheduler Support▪ Just need simple scheduling because the work load is continuous

• Mass adoption▪ Migrate most daily reporting queries from Hive

• Open Source▪ Biggest bottleneck: Java Thrift dependency▪ Will come one by one

Page 31: Real-time Analytics at Facebook

Similar Systems• STREAM from Stanford• Flume from Cloudera• S4 from Yahoo• Rainbird/Storm from Twitter• Kafka from Linkedin

Page 32: Real-time Analytics at Facebook

Key differences• Scalable Data Streams▪ 9 GB/sec with < 10 sec of latency▪ Both Push/RPC-based and Pull/File System-based▪ Components to support arbitrary combination of channels

• Reliable Stream Aggregations▪ Good support for Time-based Group By, Table-Stream Lookup Join

▪ Query Language: Puma : Realtime-MR = Hive : MR▪ No support for sliding window, stream joins

Page 33: Real-time Analytics at Facebook

(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0