file server performance

30
File Server Performance AFS vs YFS

Upload: roman

Post on 23-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

File Server Performance. AFS vs YFS. Accepted AFS Limitations A lter Deployments. U se large numbers of small file servers Use many small partitions per file server Restrict the number of processors to 1 or 2 Limit the network bandwidth to 1gbit Avoid workloads requiring: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: File Server Performance

File Server PerformanceAFS vs YFS

Page 2: File Server Performance

Accepted AFS Limitations Alter Deployments

Use large numbers of small file servers Use many small partitions per file server Restrict the number of processors to 1 or 2 Limit the network bandwidth to 1gbit Avoid workloads requiring:

• Multiple clients creating / removing entries in a single directory• Multiple clients writing to or reading from a single file• More clients than file server worker threads accessing a single volume• Applications requiring features that AFS does not offer:

Byte range locking, ext. attributes, per file ACLs, etc

Page 3: File Server Performance

Instead of fixing the core problems organizations have …

Deployed isolation file servers and complex monitoring to detect hot volumes and quarantine them

Developed complex workarounds including vicep-access, OSD, and OOB

Segregated RW and RO access into separate cells and constructed their own volume management systems to “vos release” volumes from RW cell to RO cells

Used the AFS name space for some tasks and other “high performance” file systems for others• NFS3, NFS4, Lustre, GPFS, Panasys, others

Page 4: File Server Performance

At what cost? Additional servers cost money

• US$6800 per year according to Cornell University• Including hardware depreciation, support contracts,

maintenance, power and cooling, staff time Increased complexity for end users Multiple backup strategies

Page 5: File Server Performance

The YFS Premise Maintain the data and the name space Fix the performance problems Enhance the functionality to match

Apple/Microsoft first class file systems Improve Security Save money

Page 6: File Server Performance

Talk Outline What are the bottlenecks in AFS and why do

they exist? What can be done to maximize the

performance of an AFS file server? How scalable is a YFS file server?

Page 7: File Server Performance

AFS RX File Server Throughput is bound by the

amount of data the listener thread can read from the network during any time period

As Simon Wilkinson likes to say:• “There are only two things wrong with AFS RX, the

protocol and the implementation.”

Page 8: File Server Performance

AFS RX: The Protocol Issues Incorrect Round Trip Time calculations Incorrect Retransmission Timeout

implementation

Window size vs Congested Networks• Broken window management makes congested networks

worse Soft ACKs and Hard ACKs

• Twice as many ACKs as necessary

Page 9: File Server Performance

AFS RX: Implementation Issues Lock Contention

• 20% of runtime spent waiting for locks UDP Context Switching

• Every packet processed on a different CPU• Cache line invalidation

Page 10: File Server Performance

Simon’s RX Performance Talk To see the full details, see

• http://tinyurl.com/p8c8yqs

Page 11: File Server Performance

The legacy of LWP Light weight processes (LWP) is a cooperative threading

model that was used for the original AFS implementation Only one thread can execute at a time Threads yield voluntarily or when blocking for I/O Data access is implicitly protected by single execution All lock state changes are atomic when a thread yields.

In other words:• Acquire + Release + Yield == Never Acquire

Acquire A + Acquire B == Acquire B + Acquire A

Page 12: File Server Performance

The pthreads conversion When converting a cooperative threaded

application to pthreads, it is faster to add global locks to protect data structures that are accessed across I/O than to redesign the data structures and the work flow

AFS 3.4 added pthread file servers by adding a minimum number of global locks to each package

AFS 3.6 added finer grained but still global locks

Page 13: File Server Performance

The many locks AFS file servers must acquire many mutexes

during the processing of each RPC (* = global)• RX

peer_hash*, conn_hash*, peer, conn_call, conn_data, stats*, free_packet_queue*, free_call_queue*, event_queue*, and more

• viced H* [host table, callbacks] FS* [stats] VOL* [volume metadata] VNODE [file/dir]

Page 14: File Server Performance

Lock Contention Threads are scheduled to a processor and must give up

their time slice whenever a required lock is unavailable When there are multiple processors, threads are scheduled

to a processor. Any data not in the processor cache or that has been

invalidated, must be fetched. Locks are represented as data in memory whose state changes when acquired and released.

Two side effects of global locks:• Only one thread at a time can make progress• Multiple processor cores hurt performance

Page 15: File Server Performance

AFS Cache Coherency via Callbacks An AFS file server promises its client that for a fixed

period of time it notify the client if the metadata or data state of an accessed object changes

For read write volumes, one callback promise per file object

For read only volumes, one callback promise per volume regardless of how many file objects are accessed

Today, many file servers are deployed with callback tables containing millions of entries

Page 16: File Server Performance

Host Table Contention A host table and hash tables for looking up host entries by

IP address and UUID are protected by a single global lock. Host entries have their own locks. To avoid hard

deadlocks, locking an entry requires dropping the global lock, obtaining the entry lock, and obtaining the global lock.

Soft deadlocks occur when multiple threads are blocked on the entry lock but the thread holding it is blocked waiting for the global lock.

Lock contention occurs multiple times for each new rx connection and each time a call is scheduled.

Page 17: File Server Performance

Callback Table Contention The Callback Table is protected by the same global lock

as the Host Table Each new/updated callback promise requires exclusive

access to the table Notifying registered clients of state changes (breaking

callbacks) requires exclusive access Garbage collection of expired callbacks (5 minute

intervals) requires exclusive access Callback Table Limit exceeded requires exclusive access

for immediate garbage collection and premature callback notification

Page 18: File Server Performance

Impact of Host and Callback Table Contention

The larger the callback table the longer exclusive access is maintained for garbage collection and callback breaks

While exclusive access is maintained, no calls can be scheduled nor can existing calls be completed

Page 19: File Server Performance

AFS Worker Thread Pool Increasing the worker thread pool permits

additional calls to be scheduled instead of blocking in the rx wait queue

Primary benefit of scheduling is that locks provide a filtering mechanism to decide which calls can make progress. Calls on the rx wait queue can never make progress of thread pool is exhausted

Downside of increased thread pool size is increased lock contention and more CPU time wasted on thread scheduling

Page 20: File Server Performance

Worker Thread Pool Start with “large” configuration

• -L Make thread pool as large as possible

• For 1.4, -p 128• For 1.6, -p 256

Set directory buffer size to twice the thread count• -b 512

Page 21: File Server Performance

Volume and Vnode Caches Volume Cache larger than total volume count

• -vc <number of volumes plus some> Small vnode cache (files)

• -s <10 x volume count> Large vnode cache (directories)

• -l <3 x volume count> If volumes are very large, may require higher

multiples

Page 22: File Server Performance

Callback Tables and Thrashing The callback table must be large enough to

avoid thrashing• -cb <volume-count * 13 * vnode-count>• Where that value *72 bytes should not exceed 10% of

machine physical memory Use “xstat_fs_test's -collId 3 –once” to monitor

“GetSomeSpaces” value. If non-zero, increase –cb value

Page 23: File Server Performance

UDP Tuning UDP Receive Buffer

• Must be large enough to receive all packets for in process calls.

• <thread-count * winsize (32) * packet-size>• -udpsize 16777216• Won’t take effect unless OS is configured to match

UDP Send Buffer• -sendsize 2097152

(2^21) unless client chunk size is larger

Page 24: File Server Performance

Mount vicep* with noatime AFS protocol does not expose the last access

time to clients Nor does the AFS file server make use of it Turn off last access time updates to avoid large

amounts of unnecessary disk i/o unrelated to serving the needs of clients

Page 25: File Server Performance

Syncing data to disk Syncing data to disk is very expensive. If you

trust your UPS and have a good battery backup caching storage adapter we recommend reducing the frequency of sync operations.

For 1.6.5, new option• -sync onclose

Page 26: File Server Performance

YFS File Servers Scale Far Beyond AFS YFS File Server experience much less contention

between threads RPCs take less time to complete

• Store operations do not block simultanenous Fetch requests

One YFS File Server can replace at least 30 AFS file servers• Max in-flight RPCs per AFS server = 240• Max in-flight RPCs per YFS server = 16,000 (dynamic)• 240 * 30 = 7,200

Page 27: File Server Performance

How Fast can RX/UDP go?

Up to 8.2 gbits/second per listener thread

Page 28: File Server Performance

SLAC Testing SLAC has experienced file server meltdowns for

years. Large number of file servers deployed to permit distribution of load isolation of volume accesses by users.

One YFS file server satisfied 500 client nodes for nearly 24 hours without noticeable delays• 1gbit NIC, 8 processor cores, 6gbit/sec local raid disk• 800 operations per second• 55MB/sec FetchData• 5MB/sec StoreData

Page 29: File Server Performance

Other Benefits 2038 Safe 100ns time 2^64 volumes 2^96 vnodes / volume 2^64 max

quota/vol/part size

Per File ACLs Volume Security Policies

• Max ACL / Wire Privacy Servers do not run as

“root” Linux O_DIRECT Mandatory Locking IPv6 network stack

Page 30: File Server Performance

Security, Security, Security RXGK

• GSS-API Authentication• AES-256/SHA-1 wire privacy• File server wire security policies

File servers cannot serve volumes with stronger required policies

• Combined Identity Tokens• Keyed cache managers / Machine IDs• Maximum Volume ACL prevents data leaks