最专业 的移动应用统计分析和开发者服务平台
DESCRIPTION
最专业 的移动应用统计分析和开发者服务平台. 王春国 email: [email protected] wechat / qq : 715356603. agenda. Mobile Big Data Tech stack Real time Dataflow Hadoop architect Data Warehouse Sloutions Q&A . Mobile Big Data. Mobile Data Features. Diversity Fragmentation M ulti - dimensional - PowerPoint PPT PresentationTRANSCRIPT
agenda
• Mobile Big Data• Tech stack• Real time Dataflow• Hadoop architect• Data Warehouse• Sloutions• Q&A
Mobile Data Features
• Diversity• Fragmentation• Multi-dimensional• Frequently• High-speed growth• Low quality
• 10+ billion installation• ~3+ billion request、 max 60000/s• ~5TB + day• ~1000 nodes• 2 – 2.5 billion message • 500+ job • 16 thousands + App• 65 thousands+ developer
• Java、 Scala、 Python、 Shell、 C …• Kfaka 、 Storm• Hive 、 Pig• Mapreduce • Redis、MongoDB、 HBase• Excel、 R• Finagle• Git
Protobuf
• Serializing structured data – think XML• Flexible , Efficient , Simple• Development language independence • More smaller • More faster • Format Simpler• Less ambiguous
Hive ORCFile Features
• Reduces the NameNode's load• light-weight indexes -skip row groups -seek to a given row• block-mode compression• bound the amount of memory needed for
reading or writing• metadata stored using Protocol Buffers
HQL: SELECT COUNT(1) FROM TABLE(ORCFile vs TextFile)
ORCFile TextFle0
50
100
150
200
250
300
350
400
ORCFile vs TextFile
time(s)
LZMA Compress
• More faster compression speed• More faster decompression speed• More Smaller memory requirements
decompression • More Smaller code size for decompression
Blend Scheduler
• Fair Scheduler• Map Slot <-> Reduce Slot• More efficient • Full use of cluster resources