rnotify
TRANSCRIPT
RnotifyA Scalable Distributed
filesystems Notifications Solution for Applications
Ashwin Raghav
www.rnotifications.com
github.com/ashwinraghav/rnotify-c/
1
1Tuesday, April 30, 13
Agenda• Motivation
• Problem Statement / State of the art
• General Overview
• Hypothesis
• Approach
• Evaluation
• Conclusion
2
2Tuesday, April 30, 13
Motivation
• Applications need File System Notifications
• Previously applications polled file systems naively
• Now, All Operating Systems provide FS Notifications API
3
3Tuesday, April 30, 13
ProblemVFS is an abstraction
to treat all filesystems uniformly
All FS read/writes happen via VFS - ideal place to implement
notifications
Does not work with Distributed
File Systems
4
4Tuesday, April 30, 13
Problems / State of the art
Use ad-hoc (polling) implementations for Distributed FS.
Polling creates an unfortunate tension between resource consumption and timeliness
Any general solution must be location transparent, scalable, tunable.
Use inotify to subscribe to local filesystems
5
5Tuesday, April 30, 13
Requirements
• Compatibility with existing applications that use Inotify
• Provide Horizontal Scalability, Decomposition of Functionality, Tunable Performance
• Location Transparency
• High Throughput notifications per client
6
6Tuesday, April 30, 13
Assumptions
• Relaxing Reliability Guarantees
• Modifying Notification Semantics
• Congestion Control Semantics
• Failure Notification Semantics
7
7Tuesday, April 30, 13
Related Work
• FAM (File Alteration Monitor) - does not scale
• Internet scale systems like Thialfi, Zookeeper are built for larger scales of clients.
• Bayeux, Scribe, Siena, Hermes, Swag etc assume overlay networks to establish multicast trees for message dissemination
• Inotify was introduced in kernel 2.6.13 - for local FS notifications
8
8Tuesday, April 30, 13
Overview
Multiplexing/Proxying
SubscriptionsSerializing Notifications
Demultiplexing Notifications
9
9Tuesday, April 30, 13
Hypothesis
As a result of clearly decomposing functionality into replicable components, Rnotify can be tuned to fit different notification workloads to consistently deliver notifications
at low latency.
10
10Tuesday, April 30, 13
Key Properties
• Low Latency Notifications (under 10ms)
• Compatible with applications that use Inotify
• Tuned to fit workloads
• Greedy Applications can use Rnotify by distributing their workloads across nodes.
11
11Tuesday, April 30, 13
Approach
• Registration
• Notification
• Replica Configuration Management
12
12Tuesday, April 30, 13
Registration
• Inform the Proxy about the newly watched file
• Place Registrations on preferred Publishers
13
13Tuesday, April 30, 13
• Client Driven Registration
• Registration is transactional from the application ‘s point of view
• Client Driven Migration of subscriptions
Client Library & API usage
14
14Tuesday, April 30, 13
Notification Pipeline
• Congestion Control
• Opportunistic Batching
• Publisher Selection
16
16Tuesday, April 30, 13
Dispatchers
• Serialize notification blocks
• Congestion Control
• Dispatch to Publisher
17
17Tuesday, April 30, 13
Congestion Control at Dispatcher
Subscription Id Number of notifications in Time window
1 1000
2 3000
Frequency List
Frequency List
Frequency List
NOTIFICATION_BURST is sent to Publisher
18
18Tuesday, April 30, 13
Avoid atomic broadcasts
Frequency List
Frequency List
Frequency List
Frequency List
19
19Tuesday, April 30, 13
Publishers
• Identify the subscribers for a notification
• Dispatch to the subscribers
20
20Tuesday, April 30, 13
Representing State - Publisher
Get all Subscribers
Get all Notifications
File Id IP address of Subscribers
1 192.168.1.2:3000192.168.3.4:3001
2 192.168.1.2:3000192.168.3.4:3001
Subscriber Undelivered Notifications
192.168.1.2:3000 N1, N2, N3
192.168.3.4:3001 N4, N5, N6
File Id Notifications
1 N1, N2, N3,
2 N4, N5
Append new Notification
21
21Tuesday, April 30, 13
Publisher Selection
How do the dispatchers and Registrar maintain a shared understanding of ‘preferred’ publishers?
22
22Tuesday, April 30, 13
Partition and Placement of Publishers
pos3 = SHA1(Publisher3_IP_ADDR)
pos4 = SHA1(Publisher4_IP_ADDR)
pos2 = SHA1(Publisher2_IP_ADDR)
pos1 = SHA1(Publisher1_IP_ADDR)
23
23Tuesday, April 30, 13
Partition and Placement of Subscriptions
file3 = SHA1(File_Path3)
file4 = SHA1(File_Path4)
file2 = SHA1(File_Path2)
file1 = SHA1( File_Path1)
24
24Tuesday, April 30, 13
Arrival of Publisher
new_publisher = SHA1(New_Pub_IP_Addr)
Reissue_registrations_between(pos1, pos2)
Lock free way to make configuration eventually consistent
25
25Tuesday, April 30, 13
Dispatcher Replication
• Dispatcher is provided the registrar location at startup
• It acquires the publisher list from the registrar transactionally.
• Inform the Proxies independently
26
26Tuesday, April 30, 13
Evaluation StrategyMid size GlusterFS deployment on EC2
Postmark Benchmark to represent FS activity
Using Chef to startup serviced clients
Measure Latency end to end
8xl machines with 32 cores each helped simulate several clients each
All machines were acquired within a placement group
27
27Tuesday, April 30, 13
Evaluation - Scalability
Tune Dispatchers based on FS throughput
Tune Publishers based on number of clients
28
28Tuesday, April 30, 13
Scalability - Overactive FileSystems
Post Mark threads writing to different directories29
29Tuesday, April 30, 13
Scalability - Overactive FileSystems
PostMark threads writing to same directory30
30Tuesday, April 30, 13
PostMark threads writing to different
files
PostMark threads writing to same files
Applications like web/mail server
HPC applications
Scalability - Overactive FileSystems
31
31Tuesday, April 30, 13
Performance
Demonstrate consistency
Demonstrate footprint in comparison to naive polling
33
33Tuesday, April 30, 13
Comparison to naive Polling
• Developed a poller Node.js REST API
• For just 100 clients , 5 files, 50000 stats per second
• Has an extremely heavy footprint on the FS performance
35
35Tuesday, April 30, 13
Greedy Applications
• Increasing the number of notifications delivered per client
• Linear increase in latency
• Messages spend more time in queues
36
36Tuesday, April 30, 13
Greedy Applications
If you need to consume more notifications, Distribute yourself
Inefficient Application
38
38Tuesday, April 30, 13
Summary - Why is this work different?
• FAM does not scale and is obsolete.
• All PubSub systems do not cater to many notifications per client
• Multicast Trees are established for reliability (Performance suffers)
• Pub Sub systems provide a richer set of semantics with lower performance
39
39Tuesday, April 30, 13
Future Work
• Introduce a security model
• Introduce message ordering
• Provide message delivery reliability
40
40Tuesday, April 30, 13
Conclusion• Rnotify is a solution to receive notifications from POSIX
compliant Distributed File Systems
• Tuned to fit different notification workloads
• Incrementally Scalable, location transparent and mimics Inotify
• We have tested Rnotify to scale to 2.5 million notifications per second
• Latency under 10ms for 88% notifications
41
41Tuesday, April 30, 13
Subscription Proxy
• Resides on the File Host & Proxies subscriptions & notifications.
• Idempotent API wrappers for subscription
43
43Tuesday, April 30, 13
Design Alternatives
• File System Modification
• VFS Modification
• Modifying Inotify Implementation
44
44Tuesday, April 30, 13