rnotify

46
Rnotify A Scalable Distributed filesystems Notifications Solution for Applications Ashwin Raghav www.rnotifications.com github.com/ashwinraghav/rnotify-c/ 1 1 Tuesday, April 30, 13

Upload: ashwin-raghav

Post on 18-Jul-2015

227 views

Category:

Technology


0 download

TRANSCRIPT

RnotifyA Scalable Distributed

filesystems Notifications Solution for Applications

Ashwin Raghav

www.rnotifications.com

github.com/ashwinraghav/rnotify-c/

1

1Tuesday, April 30, 13

Agenda• Motivation

• Problem Statement / State of the art

• General Overview

• Hypothesis

• Approach

• Evaluation

• Conclusion

2

2Tuesday, April 30, 13

Motivation

• Applications need File System Notifications

• Previously applications polled file systems naively

• Now, All Operating Systems provide FS Notifications API

3

3Tuesday, April 30, 13

ProblemVFS is an abstraction

to treat all filesystems uniformly

All FS read/writes happen via VFS - ideal place to implement

notifications

Does not work with Distributed

File Systems

4

4Tuesday, April 30, 13

Problems / State of the art

Use ad-hoc (polling) implementations for Distributed FS.

Polling creates an unfortunate tension between resource consumption and timeliness

Any general solution must be location transparent, scalable, tunable.

Use inotify to subscribe to local filesystems

5

5Tuesday, April 30, 13

Requirements

• Compatibility with existing applications that use Inotify

• Provide Horizontal Scalability, Decomposition of Functionality, Tunable Performance

• Location Transparency

• High Throughput notifications per client

6

6Tuesday, April 30, 13

Assumptions

• Relaxing Reliability Guarantees

• Modifying Notification Semantics

• Congestion Control Semantics

• Failure Notification Semantics

7

7Tuesday, April 30, 13

Related Work

• FAM (File Alteration Monitor) - does not scale

• Internet scale systems like Thialfi, Zookeeper are built for larger scales of clients.

• Bayeux, Scribe, Siena, Hermes, Swag etc assume overlay networks to establish multicast trees for message dissemination

• Inotify was introduced in kernel 2.6.13 - for local FS notifications

8

8Tuesday, April 30, 13

Overview

Multiplexing/Proxying

SubscriptionsSerializing Notifications

Demultiplexing Notifications

9

9Tuesday, April 30, 13

Hypothesis

As a result of clearly decomposing functionality into replicable components, Rnotify can be tuned to fit different notification workloads to consistently deliver notifications

at low latency.

10

10Tuesday, April 30, 13

Key Properties

• Low Latency Notifications (under 10ms)

• Compatible with applications that use Inotify

• Tuned to fit workloads

• Greedy Applications can use Rnotify by distributing their workloads across nodes.

11

11Tuesday, April 30, 13

Approach

• Registration

• Notification

• Replica Configuration Management

12

12Tuesday, April 30, 13

Registration

• Inform the Proxy about the newly watched file

• Place Registrations on preferred Publishers

13

13Tuesday, April 30, 13

• Client Driven Registration

• Registration is transactional from the application ‘s point of view

• Client Driven Migration of subscriptions

Client Library & API usage

14

14Tuesday, April 30, 13

Client Library & API usage

15

15Tuesday, April 30, 13

Notification Pipeline

• Congestion Control

• Opportunistic Batching

• Publisher Selection

16

16Tuesday, April 30, 13

Dispatchers

• Serialize notification blocks

• Congestion Control

• Dispatch to Publisher

17

17Tuesday, April 30, 13

Congestion Control at Dispatcher

Subscription Id Number of notifications in Time window

1 1000

2 3000

Frequency List

Frequency List

Frequency List

NOTIFICATION_BURST is sent to Publisher

18

18Tuesday, April 30, 13

Avoid atomic broadcasts

Frequency List

Frequency List

Frequency List

Frequency List

19

19Tuesday, April 30, 13

Publishers

• Identify the subscribers for a notification

• Dispatch to the subscribers

20

20Tuesday, April 30, 13

Representing State - Publisher

Get all Subscribers

Get all Notifications

File Id IP address of Subscribers

1 192.168.1.2:3000192.168.3.4:3001

2 192.168.1.2:3000192.168.3.4:3001

Subscriber Undelivered Notifications

192.168.1.2:3000 N1, N2, N3

192.168.3.4:3001 N4, N5, N6

File Id Notifications

1 N1, N2, N3,

2 N4, N5

Append new Notification

21

21Tuesday, April 30, 13

Publisher Selection

How do the dispatchers and Registrar maintain a shared understanding of ‘preferred’ publishers?

22

22Tuesday, April 30, 13

Partition and Placement of Publishers

pos3 = SHA1(Publisher3_IP_ADDR)

pos4 = SHA1(Publisher4_IP_ADDR)

pos2 = SHA1(Publisher2_IP_ADDR)

pos1 = SHA1(Publisher1_IP_ADDR)

23

23Tuesday, April 30, 13

Partition and Placement of Subscriptions

file3 = SHA1(File_Path3)

file4 = SHA1(File_Path4)

file2 = SHA1(File_Path2)

file1 = SHA1( File_Path1)

24

24Tuesday, April 30, 13

Arrival of Publisher

new_publisher = SHA1(New_Pub_IP_Addr)

Reissue_registrations_between(pos1, pos2)

Lock free way to make configuration eventually consistent

25

25Tuesday, April 30, 13

Dispatcher Replication

• Dispatcher is provided the registrar location at startup

• It acquires the publisher list from the registrar transactionally.

• Inform the Proxies independently

26

26Tuesday, April 30, 13

Evaluation StrategyMid size GlusterFS deployment on EC2

Postmark Benchmark to represent FS activity

Using Chef to startup serviced clients

Measure Latency end to end

8xl machines with 32 cores each helped simulate several clients each

All machines were acquired within a placement group

27

27Tuesday, April 30, 13

Evaluation - Scalability

Tune Dispatchers based on FS throughput

Tune Publishers based on number of clients

28

28Tuesday, April 30, 13

Scalability - Overactive FileSystems

Post Mark threads writing to different directories29

29Tuesday, April 30, 13

Scalability - Overactive FileSystems

PostMark threads writing to same directory30

30Tuesday, April 30, 13

PostMark threads writing to different

files

PostMark threads writing to same files

Applications like web/mail server

HPC applications

Scalability - Overactive FileSystems

31

31Tuesday, April 30, 13

Scalability - Servicing many clients

32

32Tuesday, April 30, 13

Performance

Demonstrate consistency

Demonstrate footprint in comparison to naive polling

33

33Tuesday, April 30, 13

Performance - Consistency

34

34Tuesday, April 30, 13

Comparison to naive Polling

• Developed a poller Node.js REST API

• For just 100 clients , 5 files, 50000 stats per second

• Has an extremely heavy footprint on the FS performance

35

35Tuesday, April 30, 13

Greedy Applications

• Increasing the number of notifications delivered per client

• Linear increase in latency

• Messages spend more time in queues

36

36Tuesday, April 30, 13

Inotify - Inefficient Applications

37

37Tuesday, April 30, 13

Greedy Applications

If you need to consume more notifications, Distribute yourself

Inefficient Application

38

38Tuesday, April 30, 13

Summary - Why is this work different?

• FAM does not scale and is obsolete.

• All PubSub systems do not cater to many notifications per client

• Multicast Trees are established for reliability (Performance suffers)

• Pub Sub systems provide a richer set of semantics with lower performance

39

39Tuesday, April 30, 13

Future Work

• Introduce a security model

• Introduce message ordering

• Provide message delivery reliability

40

40Tuesday, April 30, 13

Conclusion• Rnotify is a solution to receive notifications from POSIX

compliant Distributed File Systems

• Tuned to fit different notification workloads

• Incrementally Scalable, location transparent and mimics Inotify

• We have tested Rnotify to scale to 2.5 million notifications per second

• Latency under 10ms for 88% notifications

41

41Tuesday, April 30, 13

Questions

42

42Tuesday, April 30, 13

Subscription Proxy

• Resides on the File Host & Proxies subscriptions & notifications.

• Idempotent API wrappers for subscription

43

43Tuesday, April 30, 13

Design Alternatives

• File System Modification

• VFS Modification

• Modifying Inotify Implementation

44

44Tuesday, April 30, 13

Latency Tests - Zero

45

45Tuesday, April 30, 13

Throughput Tests - Zero

46

46Tuesday, April 30, 13