![Page 1: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/1.jpg)
C-MR: Continuously Executing MapReduce Workflows on Multi-
Core Processors
Speaker: LIN Qianhttp://www.comp.nus.edu.sg/~linqian
![Page 2: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/2.jpg)
2
Problem
• Stream applications are often time-critical• Enabling stream support for MapReduce
jobs– Simple for the Map operations– Hard for the Reduce operations
• Continuously executing MapReduce workflows requires a great deal of coordination
![Page 3: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/3.jpg)
3
C-MR Workflow
• Windows: temporal subdivisions of a stream described by– size (the amount of the stream spanning)– slide (the interval between windows)
![Page 4: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/4.jpg)
C-MR Programming Interface
• Map/Reduce operations
![Page 5: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/5.jpg)
C-MR Programming Interface (cont.1)
• Input/Output streams
![Page 6: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/6.jpg)
C-MR Programming Interface (cont.2)
• Create workflows of continuous MapReduce jobs
![Page 7: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/7.jpg)
7
C-MR vs. MapReduce
• MapReduce computing nodes receive a set of Map or Reduce tasks and each node must wait for all other nodes to complete their tasks before being allocated additional tasks.
• C-MR uses pull-based data acquisition allowing computing nodes to execute any Map or Reduce workload as they are able. Thus, straggling nodes will not hinder the progress of the other nodes if there is data available to process elsewhere in the workflow.
![Page 8: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/8.jpg)
8
C-MR Architecture
![Page 9: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/9.jpg)
Stream and Window Management
• The merged output streams are not guaranteed to retain their original orderings.
• Solution: Replicating window-bounding punctuations
![Page 10: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/10.jpg)
10
Stream and Window Management (cont.1)
A node consumes the punctuation from the sorted input stream-buffer
![Page 11: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/11.jpg)
Stream and Window Management (cont.2)
Replicate that punctuation to the other nodes
![Page 12: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/12.jpg)
Stream and Window Management (cont.3)
After all replicas are received at the intermediate buffer, collect data whose timestamps fall into the applicable interval and materialize them as a window
![Page 13: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/13.jpg)
Operator Scheduling
• Scheduling framework– Execute multiple policies simultaneously– Transition between policies based on
resource availability
• Scheduling policies
![Page 14: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/14.jpg)
Incremental Computation
Output1 = d1 + d2 + d3 + ... + dn
Output2 = d2 + d3 + d4 + ... + dn+1
Output3 = d3 + d4 + d5 + ... + dn+2
Output4 = d4 + d5 + d6 + ... + dn+3
Share the common data subset of computation
![Page 15: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/15.jpg)
15
Evaluation
• Continuously executing a MapReduce job– Compare with Phoenix++
![Page 16: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/16.jpg)
16
Evaluation (cont.1)
• Operator scheduling– Oldest data first (ODF)– Best memory trade-off (MEM)– Hybrid utilization of both policies
![Page 17: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/17.jpg)
17
Evaluation (cont.2)
• Workflow optimization
![Page 18: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/18.jpg)
18
Evaluation (cont.3)
• Workflow optimization– Latency and throughput
![Page 19: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/19.jpg)
Thank you
19
![Page 20: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/20.jpg)
20
Two Properties of Streams
• Unbounded• Accessed sequentially
Hard to be handled using traditional DBMS
![Page 21: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/21.jpg)
21
Query Operators
• Unbounded stateful operators– maintain state with no upper bound in size
run out of memory
• Blocking operators– read an entire input before emitting a
single output
might never produce a result
• Never use them, or• Use them under a refactoring
![Page 22: C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors](https://reader036.vdocuments.site/reader036/viewer/2022081518/54b92a774a795970598b4618/html5/thumbnails/22.jpg)
22
Punctuations
• Mark the end of substreams – allowing us to view an infinite stream as a
mixture of finite streams