modularity and costs greg busby computer science 614 march 26, 2002

26
Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Modularity and CostsGreg Busby

Computer Science 614

March 26, 2002

Page 2: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Problem 1 – Complexity

Protocols are necessary to do network communications Both ends must agree on format to

exchange messages

Communication protocols are complexUsing several protocols together is even more complex

Page 3: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Solution 1 – Layers

Implement each protocol independently Allows cleaner implementation

Layer protocols Maintains modularity Reduces complexity – no need to

understand interactions between protocols

Page 4: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Problem 2 – Delays

Messages get larger as additional headers are added at each layerProcessing overhead for switch between layersNeed to wait for one protocol to finish before starting the nextI/O overhead with multiple writes to memory as buffers are stored between layers

Page 5: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Solution 2 – Improve Performance

Will discuss several approaches, including pros and cons of each: x-Kernel: Puts entire communication system

directly in the kernel with specific objects and support routines

Integrated Layer Processing (ILP): Integrates protocol layers to reduce task switching and memory writes

Protocol Accelerator (PA): Reduces total data to send and shortens critical path of code between messages

Page 6: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel

Defines a uniform set of abstractions for protocolsStructures protocols for efficient interaction in the common caseSupports primitive routines for common protocol tasks

Page 7: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel Architecture

Provides objects for protocols, sessions, and messages

Creates a kernel for a specific set of protocols (static)

Instantiates sessions for each protocol as needed (dynamic)

Messages are active objects that move through protocol/sessions

Provides specific support routines

TCP

ETH

IP

UDP

Page 8: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel Objects

Protocols Create sessions Demux messages received

Sessions Represent connections Created and destroyed when connections

made/terminated

Messages Contain the data itself Passed from level to level

Page 9: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel Primitives

Buffer managers Allocate, concatenate, split, and

truncate Operate in local process heap

Map managers Add, remove, and map bindings for

protocols

Event managers Provide timers to allow timeouts

Page 10: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel Performance

2-3 x faster than Unix overallUnix cost is primarily due to socketsProtocol performance is comparableConclusion: architecture is the difference

Page 11: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

x-Kernel Conclusions

Pros: Architecture simplifies

the implementation of protocols

Uniformity of interface between protocols makes protocol performance predictable and reduces overhead between protocols

Possible to write efficient protocols by tuning the underlying architecture

Don’t need to know exact protocol stack

Cons: Requires new

compilation of the kernel for each new set of protocols

Doesn’t reduce message size (headers) or sequentiallity of processes

Primarily useful as a research tool for protocol implementation, not to improve performance per se.

Page 12: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Integrated Layer Processing (ILP)

Reduces protocol layers by integrating processingTunes performance to increase caching and avoid memory I/OEliminates redundant copies (similar to U-Net’s shared memory)

Page 13: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

ILP Architecture

Combine protocol-specific manipulations in a single loop where possibleProcess small pieces to make use of processor on-board cachingPut as much processing as possible in-line (macros) versus function calls

Page 14: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

ILP Loop

Combine marshalling (encoding), encryption, and checksumming Work in memory, reduce copyingReduces steps from 5 to 2 (increased processing at step 1)

ApplicationData

ApplicationData

Kernel Buffer

TCP Buffer

ApplicationData

TCP Buffer

Kernel Buffer

1. Marshalling (r/w)

2. Encryption (r/w)

3. Copying (r/w)

4. Checksum (r)

5. System copy (r/w) 2. System copy (r/w)

1. Marshalling

ILPSend

Non-ILPSend

encryption, andchecksumming (r/w)

Page 15: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

ILP Processing (send)

Divide message into small partsBegin marshalling and encryption on part B, then C…Process part A once length is knownFinish protocol-specific processingDoesn’t work if A must be processed first (ordering-constrained)

RPC Header Data

Length align.

TCP Header

Part A Part CPart B

marshalling, encryption

checksum

Page 16: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

ILP Performance

Processing reduction of 20-25%Throughput improvement of 10-15%Actually reduces cache usage, although designed to optimize itPerformance gains can easily be masked by using strong encryption which drastically increases processingConclusion: performance results were such that use is “debatable in existing communication systems…”

Page 17: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

ILP Conclusions

Pros Decreased

memory access up to 30%

Slightly improved performance

Cons Only applicable

with non-ordering constrained functions

Requires macros to increase speed, reducing flexibility

Protocol stack must be known before-hand

Page 18: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

The Protocol Accelerator (PA)

Reduces header overhead by sending non-changing protocol headers only onceFurther reduces total bytes by packing other header information across protocolsReduces layered protocol processing overhead by splitting processing of header and data (canonical processing)

Page 19: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Header Reduction

Four classes of Header Information Connection Identification – don’t change during

session Protocol-specific Information – depends only on

protocol state, not on message Message-specific Information – depends on

contents of message but not protocol state Gossip – included because overhead is small, but

optional

Connection Cookies 8-byte field that replaces the Connection

Identification information

Page 20: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Message Format

Connection Cookie suffices for Connection ID on 2nd & later messagesPacking information explained belowGossip is optional but useful

Connection cookie (62 bit number)

Connection Identification (first message)

Protocol-specific Information

Message-specific Information

Gossip (optional)

Packing Information (if packed)

Application Data

Connection Id Present bit

Byte order bit (big- or little-endian)

Page 21: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Processing Reduction

Canonical Protocol Processing – Breaks processing in a protocol layer into 2 parts Pre-processing Phase – build or check

message header without changing protocol state

Post-processing Phase – update protocol state; attempt to do this after message is sent or delivered

Pre-processing at every layer done before post-processing at any layer

Page 22: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Processing Reduction (cont.)

Header Prediction Use post-processing phase to predict formation

of next header

Packet Filters A pre-pre-processor that checks or ensures

header correctness without invoking protocol where possible; invokes protocol if necessary

Message packing Pack backlogged messages together if

application gets ahead – reduces space and processing since checksums etc. calculated only once

Page 23: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Processing (send)

Check backlog; queue and exit if anyCreate packing and predicted header, add to message dataRun packet filter to create message-specific data (and gossip, if any)Push to protocol if necessaryPush connection cookie onto front of message and sendPass to protocol stack for post-processing to update protocol state

Application

Network

ProtocolStack

Packer Unpacker

PA

PreSend

PreDeliver

Page 24: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Performance

Can gain an order of magnitude improvement over pure layered protocolsMaximal throughput achieved by reducing garbage collection and doing post-processing while messages are “on the wire”Conclusion: Useful in improving performance as long as PA is used on both ends of

Page 25: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

PA Conclusions

Pros Eliminates much

of the overhead of layered protocols

Significant speed improvement

Canonical processing applicable in any case

Cons Can’t communicate

with non-PA peer Specific PA needed

for set of protocols No fragmentation

of messages, so only works on small messages

Page 26: Modularity and Costs Greg Busby Computer Science 614 March 26, 2002

Summary

Protocols are layered to improve modularity and reduce complexity This reduces performance

Improving performance reduces modularity Requires foreknowledge of protocol stack

Approaches Increase use of kernel (x-Kernel) Integrate processing of all layers together (ILP) Reduce message size and speed critical path (PA)

All improve performance, but only PA results in significant improvement.