ph.d. final examination august 8, 2006
DESCRIPTION
Analysis and Implementation of Multiplexing Techniques in Connection-Oriented Communication Networks. Ph.D. Final Examination August 8, 2006. Tao Li ([email protected]) Department of Electrical and Computer Engineering SEAS, University of Virginia. References. - PowerPoint PPT PresentationTRANSCRIPT
Analysis and Implementation of Multiplexing Techniques in Connection-Oriented Communication Networks
Ph.D. Final ExaminationAugust 8, 2006
Tao Li ([email protected])
Department of Electrical and Computer Engineering
SEAS, University of Virginia
Ph.D. Final Examination 2
References T. Li, D. Logothetis, M. Veeraraghavan, “Analysis of a polling system for
telephony traffic with application to wireless LANs,” IEEE Transactions on Wireless Communications, vol. 5, pp. 1284-1293, June 2006.
T. Li, M. Veeraraghavan, “Resource allocation for a polling system with application to wireless LANs,” to be submitted for journal publication.
H. Wang, M. Veeraraghavan, R. Karri, T. Li, “Design of a High-Performance RSVP-TE Signaling Hardware Accelerator,” IEEE Journal on Selected Areas in Communications (JSAC), vol. 23, no. 8, pp. 1588-1595, August 2005.
H. Wang, M. Veeraraghavan, R. Karri, T. Li, “Hardware-Accelerated Implementation of the RSVP-TE Signaling Protocol,” in Proc. of IEEE ICC2004, June 20-24, 2004, Paris, France.
Ph.D. Final Examination 3
Outline
Background Problem statement and contributions Study a polling system with vacations Implementation of a signaling control card Conclusions
Ph.D. Final Examination 4
Background
Applications have diverse Quality of Service (QoS) requirements (bandwidth, delay, loss, etc.) deterministic QoS guarantees: mission-critical control statistical QoS guarantees: most audio/video applications No specific requirements: best-effort applications
Two types of networking technologies Connectionless (CL): Internet, best-effort type of service Connection-Oriented (CO): support of QoS
Circuit-switched networks: SONET, WDM, etc. Packet-switched networks: ATM, MPLS, etc.
Ph.D. Final Examination 5
Background (more)
Chief characteristics of CO networks Resources are reserved prior to data
transfer in a call admission control (CAC) phase
Resources are left idle during connection setup phase
Per-connection state maintenance at control-plane
How to reserve resources? – through signaling protocols
RSVP-TE, PNNI, SS7, etc.
Switchfabric
Signaling/Routing/Link management
engines
Line card
Control plane
Data plane
Architecture of a CO switch
Ph.D. Final Examination 6
Background (more)
In circuit-switched networks Reserve a dedicated circuit for a
connection
In packet-switched networks Reserve bandwidth, buffer space, etc., for
a connection Data plane: packet classification, policing,
scheduling, buffer management How much resources should be reserved?
Depends on service model (hard QoS or soft QoS), traffic characteristics (burstiness), buffer size, scheduling algorithms
Outputscheduler
Ph.D. Final Examination 7
Background (more)
Multiplexing techniques in shared-medium based access Connection-Oriented
Circuit-switched networks: FDMA, TDMA Packet-switched networks: Polling, scheduling-based access
Connectionless Random access
Ph.D. Final Examination 8
Problem statement
Our mission: Study a polling system for QoS provisioning
With application to IEEE 802.11 Target real-time application: telephony A data-plane problem
Demonstrate that signaling protocols, can, in spite of their complexity, be implemented in hardware Performance gain in terms of call-handling capacity and
message process delay A control-plane problem Supported by NSF, DOE
Ph.D. Final Examination 9
Contributions
Study of a polling scheme CDF of delay in a single queue scenario
Assume a continuous-time Markov Modulated Fluid model Can be used to approximate the CDF of delay in certain multiple-queue case
Voice capacity and delay bounds (deterministic service) For the MMF model or a discrete-time Markov ON/OFF model Allow heterogeneity
Voice capacity (statistical service) MMF model: results obtained by simulations
Resource allocation (statistical service) Assume a discrete-time Markov ON/OFF model Derive approximations for tradeoff between service degradation measure
(overflow probability, or packet loss ratio) and resource allocation
Ph.D. Final Examination 10
Contributions (more) Implementation of a signaling control card
Schematic design at a later stage Power regulation module Prior work completed by collaborators (Haobo Wang, Liji Wu)
Collaborated with Appli-CAD Inc. for PCB design Provided a reference design for 1.25Gbps signal path Examination of placement and route
Design or VHDL implementation of some functional modules Configuration module, PCI interface module, FIFO interface unit, switch-
fabric interface unit
Software design (device driver; contributed to a message generator) Debugging (board and VHDL)
Ph.D. Final Examination 11
Overview
Background Problem statement and contributions Study of a polling system with vacations
Motivation and related work System model Analysis with a continuous-time MMF model Analysis with a discrete-time Markov model
Implementation of a signaling control card Conclusions
Ph.D. Final Examination 12
Motivation
Several communication systems simultaneously support CO and CL modes of operation IEEE 802.11
polling and random access
DOCSIS and IEEE 802.16 Extended Real-Time Variable Rate and Best-Effort services
In CO mode: scheduling-based channel access
Schedulerupstreamdownstream
Ph.D. Final Examination 13
Motivation (more)
Problem: queue status info is distributed among stations for the upstream direction Instantaneous queue status not available to scheduler Can not directly use scheduling algorithms that need arrival times,
queue occupancy, or packet size Continuous exchange of queue status info can be expensive
Wireless bandwidth is scarce
Ph.D. Final Examination 14
Motivation (more)
Polling emerges as a choice Serve all queues in a round-robin order
does not require queue status information
Easy to implement: O(1) time complexity Trade efficiency for timeliness (hard)
Transmission of a poll signal consumes bandwidth If interpoll time bounded, delay also bounded
suitable for delay-sensitive applications, like telephony
Question: how many calls can be admitted? Or how much resource should be allocated for voice calls?
Ph.D. Final Examination 15
Related work
Papers on general polling systems Poisson arrival process; do not consider voice traffic
Papers on QoS provisioning in wired and wireless networks Do not specifically address the polling scheme considered in our work
Papers on voice support over MAC protocols Do not specifically address the polling scheme
Papers on voice support over IEEE 802.11 polling mode Largely simulation-based
Ph.D. Final Examination 16
System model
Vacation
Superframe length: TS
Vacation
Foreshortened polling period
Frame
Assume a superframe structure Polling period: supports voice calls Vacation period: other resource sharing schemes Partition between polling and vacation: vacation is at least θ×TS
VS: vacation stretch
Polling period VS
Ph.D. Final Examination 17
System model (more)
Polling order Round-robin with a restriction: each queue can be served
at most once in a polling period
Walk time – Twalk
Time needed for the server to move from one queue to another; models physical and MAC layer overheads
Service discipline – gated-service Pack all voice packets into one MAC frame when
responding to a poll
Ph.D. Final Examination 18
Overview
Background Problem statement and contributions Study of a polling system with vacations
Motivation and related work System model Analysis with a continuous-time MMF model
Source model Delay analysis in a single queue case Multiple-queue analysis and simulation
Analysis with a discrete-time Markov model Implementation of a signaling control card Conclusions
Ph.D. Final Examination 19
Source model
Markov Modulated Fluid model Continuous in time a and b are transition rates When ON, a bit stream is created at a
constant-rate c; when OFF, silence Average ON time: 352ms Average OFF time: 650ms
May and Zebo 1968 model
QoS requirements Stringent in delay Can tolerate a small loss ratio
ON OFF
a
b
Ph.D. Final Examination 20
Delay analysis in a single queue case
Delay of interest: DW=DQ+DS
DQ: queueing delay DS: service time, depends on service rate R and data size
DS=0: empty packet, not of interest
First, compute the PDF of TI given TI=TS+(stretch2 - stretch1) Assume: stretch1 and stretch2 are i.i.d. R.V.s with known PDF
Second, compute P{DQ≤q|TI=t, nonempty packet}, and then obtain P{DQ≤q| nonempty packet} by unconditioning
Ph.D. Final Examination 21
Delay analysis (more)
Third, compute P{Z≤z|DQ=q} and P{DS≤s|DQ=q} Z: total time spent in the ON state during DQ
Can be solved with a uniformization technique Z can be linked to DS by DS=Zc/R
c: source rate; R: service rate
Finally, combine all together, given DW=DQ+DS
P{DQ≤q| nonempty packet} obtained in the second step
P{DS≤s|DQ=q} obtained in the third step
Ph.D. Final Examination 22
Delay analysis (more)
CDF of DW with TS as a parameter
Twalk and C are set to 0.23ms and 8.5Kbps, respectively
All numerical results:
assume IEEE 802.11b PHY
Ph.D. Final Examination 23
Multiple-queue case
Deterministic service Each queue is guaranteed to be polled in a superframe Number of queues N ≤ Np (voice capacity)
Referred to as small-N regime of operation
Statistical service (when N > Np) Service degradation: not guaranteed to be polled in each
superframe; statistical QoS guarantees Statistical multiplexing gain since N > Np
Referred to as large-N regime of operation
Ph.D. Final Examination 24
Computation of Np: worst-case analysis
Polls in the kth interval: empty packets Polls in the (k+1)th interval: maximum-sized packets Vacation stretch: VSmax
Admission condition Np can be computed iteratively Delay bound DWmax,i
Ph.D. Final Examination 25
Delay in small-N regime of operation
Simulation results: CCDF of DW; θ, codec rate, and Twalk are set to 0.5, 64Kbps, and 0.23ms, respectively
Implication: delay analysis in the single queue case is a fair approximation, given the range of parameter values under consideration
Ph.D. Final Examination 26
Cost of large-N regime of operation
Simulation: CCDF of delay with N' as a parameter. TS, θ, and codec rate are equal to 30ms, 0.5, and 8.5Kbps, respectively.
Implication of delay spikes: use DWmax as delay threshold, and P{DW>DWmax} as performance measure (Ploss)
Ph.D. Final Examination 27
Statistical multiplexing
Codec rate, Twalk, and stretch are respectively set to 8.5Kbps, 0.23ms, and VSmax
Capacities increases with TS: payload size vs. Twalk
Multiplexing gain is small: large Twalk, small codec rate
Simulation results
Ph.D. Final Examination 28
Statistical multiplexing
Simulation results
Codec rate, Twalk, and stretch are respectively set to 64Kbps, 0.13ms, and VSmax
Multiplexing gain is significant: small Twalk, large codec rate
Small Twalk is attainable
Ph.D. Final Examination 29
Overview
Background Problem statement and contributions Study of a polling system with vacations
Motivation, related work System architecture Analysis with a continuous-time MMF model Analysis with a discrete-time Markov model
Implementation of a signaling control card Conclusions
Ph.D. Final Examination 30
Assume a discrete-time Markov model
Motivation Voice traffic needs to be packetized for transmission in a
packet-switched network A discrete-time Markov model is more realistic Tractability in analysis
Extend worst-case analysis for small-N regime of operation to discrete-time Markov model We derive voice capacity Nl and delay bound Dbound
Details are omitted
Delay performance is studied through simulations
Ph.D. Final Examination 31
Tsrv: the total time spent on N queues in a superframe
Performance criteria: overflow probability
The smallest x satisfying the above criteria is the amount of time that should be allocated for polling period, denoted as Tp(ε)
Difficulty in exact analysis of P{Tsrv}: correlation
Key approximation: correlation between DS,i, i=1,2,…,N, is small. Approximate DS,i, i=1,2,…,N, as i.i.d. R.V.s
Resource allocation for large-N
Ph.D. Final Examination 32
Analytical approach
Consider a reference service discipline Does not incur correlation between DS,i
Perform an exact analysis for this reference service discipline
View the results as approximations for the gated-service discipline
Reference service discipline: serve 1, 2, 3, but not 4
Ph.D. Final Examination 33
Other assumptions: TS=KL; synchronization
First, compute PK(m) for one queue the probability of m arrivals in K time slots Using a recursive approach
Then overflow probability
Computational complexity: O(NlogN) with FFT
Analytical approach
Ph.D. Final Examination 34
Computation of loss ratio
If the waiting time is too long, packet will be dropped
Define loss ratio as Ploss=E{Nloss}/E{Ntotal} Nloss : number of lost packets in a superframe
Ntotal : number of created packets in a superframe
Ploss can be linked to overflow probability Ploss ≤ P{Tsrv>x}/PON, where PON is the probability of a
voice source being in the ON state This approximation of Ploss is not very accurate
Ph.D. Final Examination 35
Computation of loss ratio (more)
For the reference service discipline, an exact computation of Ploss is possible
For Ω: Computational complexity
• O(N2) with direct convolution
Ph.D. Final Examination 36
Numerical results
The approximation of P{Tsrv>x} is satisfactory Ploss can better approximate the “actual” loss ratio
Cost: computational complexity
Implication: Use P{Tsrv>x} as the QoS measure if computational complexity is a major concern
Tp: polling period length
TS : 30ms
Dbound is set to TS+L+2ms
L: packetization interval, 10ms
Simulation: assume the gated-service discipline; drop the synchronization assumption; allow clock skew and phase error
Ph.D. Final Examination 37
Overview
Background Problem statement and contributions Study of a polling system with vacations
Implementation of a signaling control card Motivation, Related work, and Solution approach System architecture, block diagram, and picture Modules of the signaling control card Performance
Conclusions
Ph.D. Final Examination 38
Motivation
Signaling protocols Characteristics
Complex (parameters, timers, data-table lookups, keep state information)
Requirement for flexibility
Traditionally implemented in software Call-handling capacities: 1K calls/second ~ 10K calls/second Call-setup delay: in the order of hundreds of milliseconds
Sycamore SN16000 switch: per message processing delay 90ms
Ph.D. Final Examination 39
Motivation (more)
Problems with software implementation Call-setup delay impacts utilization Hard to meet the requirement for high call-handling capacities in
future CO networks
Objective: demonstrate that signaling protocols can be implemented in hardware in spite of their complexity Reduce call-setup delay by at least two-to-three orders of magnitude Increase call-handling capacity significantly
Target signaling protocol and switch RSVP-TE with extensions for GMPLS SONET switch
Ph.D. Final Examination 40
Related work
TCP offloading engine Observation: Overhead of TCP/IP processing overwhelms
server’s CPU Solution: Moving TCP/IP processing to a dedicated h/w
Software implementations of RSVP-TE E.g.: Sycamore SN16000 switch with a per-message
processing-delay of about 90ms
Ph.D. Final Examination 41
Solution approach
Manage the complexity of signaling protocols By only supporting basic and most frequently used
messages/parameters in hardware and relegating the rest to software
Define a subset of the signaling protocol for hardware implementation (RSVP-TE with extensions for GMPLS) Four messages related to connection setup and release: Path,
Resv, PathTear, and ResvTear Support all mandatory objects/parameters and optional
parameters needed for SONET switch
Ph.D. Final Examination 42
Solution approach (more)
Meet the flexibility requirement using reconfigurable Field Programmable Gate Array FPGA can be reloaded with updated versions
Achieve fast data-table lookups and state maintenance by using Ternary Content Addressable Memory (TCAM)
TCAM: a special memory device designed for data-table lookups Complexity of a lookup operation: one clock cycle
Ph.D. Final Examination 43
System architecture
Focus on signaling control card Backplane: often proprietary. We assume PCI bus. Switch fabric card: assume Vitesse 64x64 STS-12
Cross-connection rate: STS-1 (51.8Mbps), total bandwidth: 40Gbps
Powermodule
CPUcard
Signalingcontrol card
Switchfabriccard
Linecards
Backplane
PCI bus
Signalingcontrol
card
CPU cardor host
computer
Switchfabriccard
SONETLinecards
GigabitEthernet link
To signaling processingunit in a peer switch
Ph.D. Final Examination 44
Block diagram of implementation
Power regulation module
Hardware signaling accelerator
PCI interface module
Gbit Ethernet module
Configurationmodule
5v, 3.3v 5v, 3.3v
1.5v, 1.8v, 2.5v
Optical fiber PCI bus
Ph.D. Final Examination 45
Top view of the card
Ph.D. Final Examination 46
Gigabit Ethernet module
Optical-fiber transceiver: convert between optical signals and differential PECL signals
SerDes: convert between serial PECL signals to parallel TTL signals Ethernet controller: 8B/10B encoding/decoding, MAC layer operations
Optical-fiberTransceiver
(Agilent HFCT-53D5EM)
SerDes (Agilent HDMP-
1636A)
+
-
+
-
GigabitEthernet
Controller
(LSI Logic8104)
Timing/Control/Status
Tx[9:0]
Rx[9:0]
Tx data/controlsignals
Receiversection
Rx data/controlsignals
MAC registerinterface
SCLK (50MHz)
HardwareSignaling
Accelerator
ConfigurationModule
125 MHz clock
Opticalfiber
10-bit PHYinterface
MAC interface
High speedserial
interfaceTransmitter
section
Ph.D. Final Examination 47
Hardware signaling accelerator module
Hardware signaling accelerator core: all major functions such as message parsing, creating commands for route lookup, state maintenance, and switch-fabric programming, etc.
MAC/Switch fabric/FIFO/TCAM_SRAM interface units: data path, control/timing signals
FIFO: temporary storage of unsupported signaling messages TCAM/SRAM: Route lookup operation, state maintenance operations
Tx data/control signals
Rx data/control signals
SCLK (50 MHz)
MAC interface
TCAM
SRAM
MUX
MUX
MACinterface
unit
FIFO
Hardwaresignaling
acceleratorcore
FIFOinterface
unit
To configuration module
Switch fabricinterface unit
To PCIinterfacemodule
IDT72V36110
IDT75P52100
IDT71V2556
TCAMand SRAMinterface
unit
INIT_DONE
Ph.D. Final Examination 48
PCI interface module
CPU card interface unit: move messages from FIFO to host memory space through Direct Memory Access (DMA)
Switch-fabric control unit: transmit programming command using DMA Access arbiter: give switch-fabric control unit higher priority Configuration interface unit: facilitate management of the card PCI core: provide commonly used functions for PCI accessing
PCIcore
CPU cardinterface unit
Switch fabriccontrol unit
PCI bus
Configuration interfaceunit
Switch fabricinterface unit
FIFOinterface unit
Configurationmodule
Accessarbiter
Xilinx LogiCOREPCI32
Mastercontrol
Targetcontrol
Ph.D. Final Examination 49
Configuration Module
Enable configuration of MAC address, IP addresses, routing table and other data tables
Initialize the GbE controller, SRAM, and TCAM Create clock and control signals needed for each
device
Ph.D. Final Examination 50
Performance
Call-handling capacity 400K calls/second (Hardware signaling accelerator module)
Software-based implementation: 1K~10K calls/second
250K calls/second, limited by the 1Gbps link rate Load on the TCAM: about 6%
Processing delay Per-message processing delay ≤ 2.4 microsecond
Sycamore SN16000 switch: ≈ 90 ms
Ph.D. Final Examination 51
Performance (more)
Concurrent connections 64 ports, each consisting of 12 STS-1 circuits
768 connections (total data rate: 768 × 51.8Mbps ≈ 40Gbps)
Maintaining state for 768 connections consumes 1/32 of TCAM’s memory space
Better performance can be obtained in future implementation Call-handling capacity, processing delay, number of
concurrent connections
Ph.D. Final Examination 52
Performance (more)
File size
Define per-call utilization ratio as : U=Tfile/(Tfile+Tsetup), where Tsetup = Tprocessing+Tpropagation&emission
Condition: U ≥ x % Assume: Tpropagation&emission is fixed Assume: software-based Tprocessing >> hardware-based Tprocessing
Operational region with software implementation
Operational region with hardware implementation
Operational region with zero processing delay (ideal)
Circuitrate
Avg. file size for s/w
Avg. file size for h/w
Average file size:Determined by call-handling capacity
Ph.D. Final Examination 53
Summary
Developed analytical models for polling-based access scheme For delay performance, deterministic service, and statistical service Can be used for CAC
Limitations: simple traffic model; no consideration for channel variations; polling can be inefficient if traffic is extremely bursty (e.g., connection request)
Implemented a subset of RSVP-TE with extensions for GMPLS in hardware 2-3 orders of performance gain in magnitude Enable circuit-switched networks to efficiently support a wider range
of applications
Ph.D. Final Examination 54
Questions?
Thank you!