![Page 1: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/1.jpg)
Naiad (Timely Dataflow) & Streaming Systems
CS 848: Models and Applications of Distributed Data SystemsMon, Nov 7th 2016
Amine Mhedhbi
![Page 2: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/2.jpg)
What is ‘Timely’ Dataflow ?!What is its significance?
![Page 3: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/3.jpg)
Dataflow ?!
![Page 4: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/4.jpg)
Dataflow?!
![Page 5: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/5.jpg)
Dataflow?!
![Page 6: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/6.jpg)
Dataflow?!
![Page 7: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/7.jpg)
Dataflow?!
![Page 8: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/8.jpg)
Dataflow?!
![Page 9: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/9.jpg)
Dataflow?!
![Page 10: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/10.jpg)
Dataflow● Batch Processing e.g. MapReduce, Spark● Asynchronous Processing e.g. Storm, MillWheel● Variations for Graph Processing e.g. Pregel, GraphLab
![Page 11: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/11.jpg)
Dataflow: Batch Processing
![Page 12: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/12.jpg)
Dataflow: Batch Processing
![Page 13: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/13.jpg)
Dataflow: Batch Processing
![Page 14: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/14.jpg)
Dataflow: Batch Processing
![Page 15: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/15.jpg)
Dataflow: Batch Processing
![Page 16: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/16.jpg)
Dataflow: Batch Processing● Iterations make use of synchronization.● The cost is latency.
![Page 17: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/17.jpg)
Dataflow: Asynchronous Processing
![Page 18: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/18.jpg)
Dataflow: Asynchronous Processing● Compared with batch:
○ latency is lower.
○ Aggregations are incremental and data changes over time.
● More efficient for distributed systems.○ Stages do not need coordination.
● Correspondence between input & output is lost.
![Page 19: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/19.jpg)
So, what is (Naiad)Timely dataflow ?!
![Page 20: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/20.jpg)
Timely Dataflow?!
![Page 21: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/21.jpg)
Timely Dataflow● Reconcile both models batch and async.● Low-latency and high-throughput.
![Page 22: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/22.jpg)
Where does Naiad fit?!
![Page 23: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/23.jpg)
Naiad?!● It is the prototype built by Microsoft Research
underlying Timely dataflow Computational model.
● Iterative and incremental computations.
● The logical timestamps allow coordination.
● Provides efficiency, maintainability and simplicity.
![Page 24: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/24.jpg)
Let’s look at a computational example
![Page 25: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/25.jpg)
![Page 26: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/26.jpg)
Naiad?!● It is the prototype built by Microsoft Research
underlying Timely dataflow Computational model.
![Page 27: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/27.jpg)
The Timely Dataflow Graph Structure
![Page 28: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/28.jpg)
Graph Structure
![Page 29: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/29.jpg)
Graph Structure
![Page 30: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/30.jpg)
Graph Structure
![Page 31: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/31.jpg)
Graph Structure
● input comes in as (data, 0), (data, 1), (data, 2)○ Within a loop, I adds a loop counter so it is (data, epoch, 0)
F in each iteration increments the loop counter (data, epoch, 1) etc.E removes the loop counter and it is back to (data, epoch)
![Page 32: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/32.jpg)
Programming Model Using the timestamps
![Page 33: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/33.jpg)
Programming Model
![Page 34: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/34.jpg)
Programming Model
![Page 35: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/35.jpg)
Programming Model
![Page 36: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/36.jpg)
Programming Model
![Page 37: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/37.jpg)
Programming Model
![Page 38: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/38.jpg)
Programming Model Summary● SendBy(edge, message, timestamp)● OnRecv(edge, message, timestamp)● NotifyAt(timestamp)● OnNotify(timestamp)
![Page 39: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/39.jpg)
Programming Model In Practice
![Page 40: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/40.jpg)
Notice● Project was discontinued in 2014.
Silicon Valley lab closed.
● The paper uses C#.The latest one is open sourced and is in Rust.
![Page 41: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/41.jpg)
Word Count ExampleClass V<Msg, Time>: Vertex<Time> { ... }
![Page 42: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/42.jpg)
Word Count Example{ Dict<Time, Dict<Msg, int> > counts; ... }
![Page 43: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/43.jpg)
Word Count Example (2 Different Implementations){ void OnRecv (Edge e, Msg m, Time t) { ... } void OnNotify (Time t) { ... } }
![Page 44: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/44.jpg)
Writing Programs in General● It is possible to write programs against the Timely
Dataflow abstraction.
● It is possible to use libraries (MapReduce, Pregel, PowerGraph, LINQ etc.)
● In General:○ Define Input, computational & Output vertices.○ Create a timely dataflow graph using the appropriate interface.○ Supply labeled data to input stages.○ Stages follow a push-based model.
![Page 45: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/45.jpg)
Timely Guarantees
![Page 46: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/46.jpg)
How is timely dataflow achieved
![Page 47: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/47.jpg)
How is timely dataflow achieved ● Key point: timestamps at which future message can occur
depends on: 1. Unprocessed events & 2. Graph Structure.
![Page 48: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/48.jpg)
How is timely dataflow achieved● Pointstamp of an event (timestamp, location: E or V)
○ SendBy -> Msg event of pointstamp (t, e)○ NotifyAt -> Notif event of pointstamp (t, v)
![Page 49: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/49.jpg)
How is timely dataflow achieved ● Pointstamp(t1, l1) could-result-in Pointstamp(t2, l2)
If there is a path between l1 and l2 presented by f() i.e. f(t1) <= t2
![Page 50: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/50.jpg)
How is timely dataflow achieved (Correctness Guarantees)● Path Summary between A and C: “”
![Page 51: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/51.jpg)
How is timely dataflow achieved (Correctness Guarantees)● Path Summary between A and C: “add” or “add-increment(n)”
![Page 52: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/52.jpg)
Single-Threaded Implementation● Scheduler that needs to deliver events.
![Page 53: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/53.jpg)
Single-Threaded Implementation● Scheduler has active pointstamps <-> unprocessed events.
![Page 54: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/54.jpg)
Single-Threaded Implementation● Scheduler has active pointstamps <-> unprocessed events.● Scheduler has two counts:
○ Occurrence count of not resolved event.
○ Precursor count of how many active pointstamps precede it in the could-result-in order.
![Page 55: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/55.jpg)
Single-Threaded Implementation● Pointstamp(t, l) becomes active.
Precursor count to number of existing active pointstamps that could result in it.Increment precursor count of any pointstamp it could-result-in.Becomes not active when occurrence is zero.When not active, decrement the precursor count for any pointstamp that it could-result-in.
![Page 56: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/56.jpg)
The Distributed Environment
![Page 57: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/57.jpg)
Distributed Implementation
![Page 58: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/58.jpg)
Distributed Progress Tracking● Initial protocol: same as single multi-threaded.
○ Broadcast occurrence count updates.
● Do not immediately update local occurrence count.○ Broadcast progress updates to all workers including myself.○ Broadcast from a worker to another delivered in a FIFO manner.
● Use of a projected timestamp.● A technique to buffer and accumulate updates.
![Page 59: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/59.jpg)
Micro-Stragglers● Have a big effect on overall performance.
○ Packet Loss (Networking)○ Contention on concurrent data○ Garbage collection
![Page 60: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/60.jpg)
Performance Evaluation
![Page 61: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/61.jpg)
Performance Evaluation● I invite you to read: “Scalability! BUT at what Cost”
![Page 62: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/62.jpg)
Performance Evaluation● Comparison with:
○ SQL Server Parallel Data Warehouse (RDBMS)
○ Scalable HyperLink Store ( distributed in-memory DB for storing large portions of the web graph)
○ DryadLINQ (data parallel computing using a declarative / high level
programming language)
● Algos i.e. PageRank, SCC etc.
![Page 63: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/63.jpg)
Conclusion: “Our prototype outperforms general-purpose batch processors and often outperforms state-of-the-art async systems which provide few semantic guarantees.”
![Page 64: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/64.jpg)
Conclusion: “Our prototype outperforms general-purpose batch processors and often outperforms state-of-the-art async systems which provide few semantic guarantees.”
![Page 65: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/65.jpg)
Streaming Systemsas of today
![Page 66: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/66.jpg)
Streaming Systems● Systems that have unbounded data in mind.● They are a superset of batch processing systems.
![Page 67: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/67.jpg)
Streaming Systems
Reference: Fig-1: Example of time domain mapping. Streaming 101
![Page 68: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/68.jpg)
Streaming SystemsDesign Questions:
● What results are calculated?The types of transformations within the pipeline.
● Where in event time are results calculated?The use of event-time windowing within the pipeline.
● When in processing time are results materialized? The use of watermarks and triggers.
● How do refinements of results relate?Discard or accumulate or accumulate and retract.
![Page 69: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/69.jpg)
Fin.
Thank you!
![Page 70: Naiad (Timely Dataflow) & Streaming Systemsssalihog/courses/slides/... · PowerGraph, LINQ etc.) In General: Define Input, computational & Output vertices. Create a timely dataflow](https://reader030.vdocuments.site/reader030/viewer/2022040612/5f0517967e708231d4113912/html5/thumbnails/70.jpg)
Resources● Link to transcribed talk in pdf format.
● Timely Dataflow (Rust Implementation)
● Frank blog posts:○ Timely dataflow
○ Differential dataflow
● The world beyond batch: Streaming 101
● The world beyond batch: Streaming 102