jazoon'13 - benoit perroud - realtime queries
DESCRIPTION
http://guide13.jazoon.com/#/submissions/133TRANSCRIPT
Enabling Real-time Queries to End UsersBenoit Perroud
• Benoit Perroud
• Software Engineer @Verisign
• Leading Hadoop Team
• Apache Committer
• @killerwhile
About me
|
• What’s going on
• Batch and Realtime
• Hadoop Deployments
• Next steps
Agenda
|
• Mainframes are obsolete, replaced by commodity hardware’s cluster
• TenG (10Gb/s) links are the new standard
• RESTful APIs are everywhere
• Everybody wants to visit Paxos island
• Firehoses do not only carry water
• Asynchronous non-blocking functional programming is taught at primary school
• NoSQL is the new way to store data at scale
• API management startups are rising (and raising)
• Hadoop keywords boost your LinkedIn profile by 2000%
• Public clouds are responsible for more than 50% of the global Internet traffic
• … and counting …
What’s going on
|
| Speaker’s Logo
Source: http://dev.datasift.com/blog/high-scalabilityNote: the diagram is stamped from 2009, it is probablypartially or even completely outdated today
A Possible
Deployment
Batch and Realtime
|
Batch Processing
Batch 1
Batch 1 ready to be served
Time
Batch 1 startsprocessing
t1 t2
Batch 2
Batch 2 ready to be served
Batch 2 startsprocessing
t3 t4
Query data from t1 Query data from t3
Batch 3
Batch 3 startsprocessing
t5
Data gap Data gap
|
Batch Processing in details
Batch with data from yesterday
Time
New batch granularityperiod
Let some timefor data to finishupload
Load resultsin a data store
Notify the retrieval systema new batch is readyto be served
Processing time
|
Query data from the day before yesterday?
• Interactive query
• REST like request/response query type
And
• Query the latest version of the data
• Latest meaning n seconds ago with n known and fixed
Realtime Query
|
Hybrid Approach
Batch 1
Batch 1 ready to be served
Time
Batch 1 startsprocessing
t1 t2
Batch 2
Batch 2 ready to be served
Batch 2 startsprocessing
t3 t4
Query data from t1 snapshot AND
complementary data
|
Complementary data for batch 1Complementary data for batch 2
Query data from t2 snapshot AND
complementary data
Hadoop
Deployments
|
Naïve Hadoop Deployment
Gateway
NameNode
hdfs dfs -put
mapred job …jar
hdfs dfs -get
JobTracker
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
Processing
|
Industry Hadoop
Deployment
Data In GW
Data Out GWMetadata StoreMonitoring
Gateway
NameNode JobTracker
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNode
Processing
NameNode JobTracker NameNode JobTracker
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
NameNode
Research,Data Science
|
Realtime Hadoop Deployment
Data In GW
Gateway
NameNode JobTracker
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
Processing
NameNode JobTracker
RT Data Out GW
RT processing
|
Realtime Search with Hadoop
Data In GW
Gateway
NameNode JobTracker
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
DataNodeDataNode
Generate Indexes
NameNode JobTracker
RT Data Out GW
Update indexes
|
Coordinator
Next Steps
|
… is moving … really fast
•Interactive Queries: Cloudera Impala, Apache Drills, Tez, …
•Search: SolrCloud, ElasticSearch, Cloudera Search
•Hybrid layer: Twitter SummingBird
•… and counting …
Hadoop Ecosystem
|
Thanks for the attention!
Follow @[email protected]
“Copyright © 2013 VeriSign, Inc. All rights reserved. The VERISIGN word mark, the Verisign logo, and other Verisign trademarks, service marks, and designs that may appear herein are registered or unregistered trademarks or service marks of VeriSign, Inc., and its subsidiaries in the United States and foreign countries. All other trademarks, service marks, and designs are property of their respective owners. Verisign has made efforts to ensure the accuracy and completeness of the information in this document. However, Verisign makes no warranties of any kind (whether express, implied or statutory) with respect to the information contained herein. Verisign assumes no liability to any party for any loss or damage (whether direct or indirect) caused by any errors, omissions, or statements of any kind contained in this document. Further, Verisign assumes no liability arising from the application or use of the products, services, or materials described or referenced herein and specifically disclaims any representation that any such products, services, or materials do not infringe upon any existing or future intellectual property rights.”