streaming architecture zx_dec2015
TRANSCRIPT
![Page 1: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/1.jpg)
从零到千亿级实时数据处理浅谈“ ”流 化你的应用构架
![Page 2: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/2.jpg)
– https://netflix.github.io/
– http://www.oschina.net/project/netflix
链接
![Page 3: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/3.jpg)
• Netflix日处理千亿级数据平台•大数据技术简短历史•深潜流构架及技术基础•为什么你的 App “ ”也 流 的转!
话题
![Page 4: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/4.jpg)
大家来回忆一下 :
如何用最基础的数据结构实现 hash table?
Before we start...
![Page 5: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/5.jpg)
大家来想一下 :
“为什么 ”有些人 总不让你用全局变量?
Before we start...
![Page 6: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/6.jpg)
● 日处理七千亿条/ 1+ PB 数据● 顶峰每秒处理 1千万条/ 20+ GB 信息● 3000+ Kafka brokers , 12 clusters in 3 regions
● 10,000+ Docker容器部署We help Produce,
Store,Process,
MoveEvents @ Cloud
scale
Netflix Keystone Pipeline
![Page 7: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/7.jpg)
Keystone构架
Stream Consumers
SamzaRouter
EMR
FrontingKafka
ConsumerKafka
Control Plane
EventProducer
KS
Prox
y
![Page 8: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/8.jpg)
● 横向可扩展构架● 完全构架在 AWS云端基础设施上● At-least-once 投递保证
● 容纳 back pressure ,容纳云端不稳定基础服务● Sink level isolation
● 同时支持数据中心内及跨洲际数据中心 failover
● High availability, scalability & durability
● Streaming Architecture
Netflix Keystone Pipeline
![Page 9: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/9.jpg)
Big Data History
为什么要用 Streaming Architecture?
![Page 10: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/10.jpg)
Big Data History
![Page 11: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/11.jpg)
Big Data History
![Page 12: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/12.jpg)
Big Data History
![Page 13: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/13.jpg)
对流数据的现实需求
●数据爆炸性增长
![Page 14: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/14.jpg)
对流数据的现实需求
●数据爆炸性增长●数据处理模式的需求变化
![Page 15: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/15.jpg)
如何实现 hash table?
![Page 16: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/16.jpg)
教科书说:
如何实现 hash table?
![Page 17: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/17.jpg)
如何实现 hash table?
![Page 18: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/18.jpg)
如何实现 hash table?
![Page 19: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/19.jpg)
如何实现 hash table?
![Page 20: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/20.jpg)
Commit log
Commit log 是很多分布式系统中的核心● Database Replication● Paxos Consensus● Kafka● … …
![Page 21: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/21.jpg)
1. 传统应用构架从零开始
![Page 22: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/22.jpg)
1. 传统应用构架从零开始
![Page 23: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/23.jpg)
2. 传统应用构架 - Scale up DB!
![Page 24: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/24.jpg)
3. 传统应用构架 - Caching!
res = cache.get(key)
if (!res) {r = db.get(key)cache.put(key, r)
}
return r;
![Page 25: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/25.jpg)
3. 传统应用构架 - Caching!
缓存分布式系统难题:● Cache coherence● Cache Invalidation● Consistency issue● Cold start / bootstraping
为什么?● 分布式系统中网络延迟永远大于零● Race condition● Source of truth和客户端看到的永远可能不一致
![Page 26: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/26.jpg)
3. 传统应用构架 - Caching!
![Page 27: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/27.jpg)
4. 传统应用平台构架 - multi-layered!
![Page 28: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/28.jpg)
4. 传统应用构架 - multi-layered!
分层组件之间的 Reconciliation 协议● 最终一直性( eventual consistency)● 轮询协议( polling protocol)● 物质化视图(materialized view)
![Page 29: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/29.jpg)
?为什么保存状态的数据库一定要在构架的最底端?
![Page 30: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/30.jpg)
介绍流构架
![Page 31: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/31.jpg)
介绍流构架
![Page 32: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/32.jpg)
流构架特性● 数据不可变性● 顺序可能很重要● Real time & Reactive● Request / Response ⇒ Subscribe / Notify● 预先计算好的缓存● 流构架可以迭代组合● 同一个数据流可产生不同的物质化视图● Delivery guarantee● stream everywhere!
![Page 33: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/33.jpg)
核心实现细节
+
* Samza可由其他 streaming processing framework替代。
![Page 34: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/34.jpg)
核心实现细节
为什么 Docker和流处理是天生一对?
![Page 35: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/35.jpg)
核心实现细节
![Page 36: Streaming architecture zx_dec2015](https://reader035.vdocuments.site/reader035/viewer/2022062412/5873bb651a28abbc788b5627/html5/thumbnails/36.jpg)
Streaming Architecture
Questions?