modularity and costs greg busby computer science 614 march 26, 2002
Post on 21-Dec-2015
215 views
TRANSCRIPT
Modularity and CostsGreg Busby
Computer Science 614
March 26, 2002
Problem 1 – Complexity
Protocols are necessary to do network communications Both ends must agree on format to
exchange messages
Communication protocols are complexUsing several protocols together is even more complex
Solution 1 – Layers
Implement each protocol independently Allows cleaner implementation
Layer protocols Maintains modularity Reduces complexity – no need to
understand interactions between protocols
Problem 2 – Delays
Messages get larger as additional headers are added at each layerProcessing overhead for switch between layersNeed to wait for one protocol to finish before starting the nextI/O overhead with multiple writes to memory as buffers are stored between layers
Solution 2 – Improve Performance
Will discuss several approaches, including pros and cons of each: x-Kernel: Puts entire communication system
directly in the kernel with specific objects and support routines
Integrated Layer Processing (ILP): Integrates protocol layers to reduce task switching and memory writes
Protocol Accelerator (PA): Reduces total data to send and shortens critical path of code between messages
x-Kernel
Defines a uniform set of abstractions for protocolsStructures protocols for efficient interaction in the common caseSupports primitive routines for common protocol tasks
x-Kernel Architecture
Provides objects for protocols, sessions, and messages
Creates a kernel for a specific set of protocols (static)
Instantiates sessions for each protocol as needed (dynamic)
Messages are active objects that move through protocol/sessions
Provides specific support routines
TCP
ETH
IP
UDP
x-Kernel Objects
Protocols Create sessions Demux messages received
Sessions Represent connections Created and destroyed when connections
made/terminated
Messages Contain the data itself Passed from level to level
x-Kernel Primitives
Buffer managers Allocate, concatenate, split, and
truncate Operate in local process heap
Map managers Add, remove, and map bindings for
protocols
Event managers Provide timers to allow timeouts
x-Kernel Performance
2-3 x faster than Unix overallUnix cost is primarily due to socketsProtocol performance is comparableConclusion: architecture is the difference
x-Kernel Conclusions
Pros: Architecture simplifies
the implementation of protocols
Uniformity of interface between protocols makes protocol performance predictable and reduces overhead between protocols
Possible to write efficient protocols by tuning the underlying architecture
Don’t need to know exact protocol stack
Cons: Requires new
compilation of the kernel for each new set of protocols
Doesn’t reduce message size (headers) or sequentiallity of processes
Primarily useful as a research tool for protocol implementation, not to improve performance per se.
Integrated Layer Processing (ILP)
Reduces protocol layers by integrating processingTunes performance to increase caching and avoid memory I/OEliminates redundant copies (similar to U-Net’s shared memory)
ILP Architecture
Combine protocol-specific manipulations in a single loop where possibleProcess small pieces to make use of processor on-board cachingPut as much processing as possible in-line (macros) versus function calls
ILP Loop
Combine marshalling (encoding), encryption, and checksumming Work in memory, reduce copyingReduces steps from 5 to 2 (increased processing at step 1)
ApplicationData
ApplicationData
Kernel Buffer
TCP Buffer
ApplicationData
TCP Buffer
Kernel Buffer
1. Marshalling (r/w)
2. Encryption (r/w)
3. Copying (r/w)
4. Checksum (r)
5. System copy (r/w) 2. System copy (r/w)
1. Marshalling
ILPSend
Non-ILPSend
encryption, andchecksumming (r/w)
ILP Processing (send)
Divide message into small partsBegin marshalling and encryption on part B, then C…Process part A once length is knownFinish protocol-specific processingDoesn’t work if A must be processed first (ordering-constrained)
RPC Header Data
Length align.
TCP Header
Part A Part CPart B
marshalling, encryption
checksum
ILP Performance
Processing reduction of 20-25%Throughput improvement of 10-15%Actually reduces cache usage, although designed to optimize itPerformance gains can easily be masked by using strong encryption which drastically increases processingConclusion: performance results were such that use is “debatable in existing communication systems…”
ILP Conclusions
Pros Decreased
memory access up to 30%
Slightly improved performance
Cons Only applicable
with non-ordering constrained functions
Requires macros to increase speed, reducing flexibility
Protocol stack must be known before-hand
The Protocol Accelerator (PA)
Reduces header overhead by sending non-changing protocol headers only onceFurther reduces total bytes by packing other header information across protocolsReduces layered protocol processing overhead by splitting processing of header and data (canonical processing)
PA Header Reduction
Four classes of Header Information Connection Identification – don’t change during
session Protocol-specific Information – depends only on
protocol state, not on message Message-specific Information – depends on
contents of message but not protocol state Gossip – included because overhead is small, but
optional
Connection Cookies 8-byte field that replaces the Connection
Identification information
PA Message Format
Connection Cookie suffices for Connection ID on 2nd & later messagesPacking information explained belowGossip is optional but useful
Connection cookie (62 bit number)
Connection Identification (first message)
Protocol-specific Information
Message-specific Information
Gossip (optional)
Packing Information (if packed)
Application Data
Connection Id Present bit
Byte order bit (big- or little-endian)
PA Processing Reduction
Canonical Protocol Processing – Breaks processing in a protocol layer into 2 parts Pre-processing Phase – build or check
message header without changing protocol state
Post-processing Phase – update protocol state; attempt to do this after message is sent or delivered
Pre-processing at every layer done before post-processing at any layer
PA Processing Reduction (cont.)
Header Prediction Use post-processing phase to predict formation
of next header
Packet Filters A pre-pre-processor that checks or ensures
header correctness without invoking protocol where possible; invokes protocol if necessary
Message packing Pack backlogged messages together if
application gets ahead – reduces space and processing since checksums etc. calculated only once
PA Processing (send)
Check backlog; queue and exit if anyCreate packing and predicted header, add to message dataRun packet filter to create message-specific data (and gossip, if any)Push to protocol if necessaryPush connection cookie onto front of message and sendPass to protocol stack for post-processing to update protocol state
Application
Network
ProtocolStack
Packer Unpacker
PA
PreSend
PreDeliver
PA Performance
Can gain an order of magnitude improvement over pure layered protocolsMaximal throughput achieved by reducing garbage collection and doing post-processing while messages are “on the wire”Conclusion: Useful in improving performance as long as PA is used on both ends of
PA Conclusions
Pros Eliminates much
of the overhead of layered protocols
Significant speed improvement
Canonical processing applicable in any case
Cons Can’t communicate
with non-PA peer Specific PA needed
for set of protocols No fragmentation
of messages, so only works on small messages
Summary
Protocols are layered to improve modularity and reduce complexity This reduces performance
Improving performance reduces modularity Requires foreknowledge of protocol stack
Approaches Increase use of kernel (x-Kernel) Integrate processing of all layers together (ILP) Reduce message size and speed critical path (PA)
All improve performance, but only PA results in significant improvement.