system software optiputer system software eric weigle et al. speaking for andrew a. chien computer...
TRANSCRIPT
System Software
OptIPuter System Software
Eric Weigle et al. speaking for
Andrew A. ChienComputer Science and Engineering, UCSD
January 2006
OptIPuter All-Hands Meeting
System Software
OptIPuter System Software Architecture
Distributed Applications/ Web Services
Telescience
GTP XCP UDT
LambdaStreamCEP RBUDP
Vol-a-Tile
SAGE JuxtaView
Visualization
DVC ConfigurationDVC API
DVC Runtime Library
Data Services
LambdaRAM
Globus
XIOPIN/PDC
DVC Services
DVC Core Services
DVC Job Scheduling
DVCCommunication
Resource Identify/Acquire
NamespaceManagement
Security Management
High SpeedCommunication
Storage Services
GRAM GSI RobuStore
Photonic
InfrastructureHigh-Speed
Transport Protocols
Optical Signaling,
Management
Distributed Virtual Computer Middleware
VisualizationApplications
System Software
Performance Across Layers
Software
Architecture
Test Suites Performance Tools
Applications Geophysics
Neuroscience
(datasets with range from 1MB to 1GB++)
Prophesy, xosview, HP’s netperf, ntel’s IMB, SAGE Graphics Performance Monitor
Visualization JuxtaView, Vol-A-Tile, SAGE
(number of nodes for rendering/display)
Distributed Virtual Computer
DVC Communication
Novel Transport Protocols
CAVEWave LambdaStream, SAGE, NetLogger, MAGNET
Optical Network Configuration
Optical NetworkConfiguration
Applications
Novel TransportProtocols
Distributed VirtualComputer
Visualization
Year 4 & 5: Integration, Performance Tuning, Tech TransferYear 4 & 5: Integration, Performance Tuning, Tech Transfer
3 & 5 L
ayer Dem
os
Valerie Taylor et al (TAMU)
System Software
Cross Team Integration and Demonstrations
• 2-layer Demo, TeraBIT Juggling, [SC2004, Nov2004]– DVC middleware, high-speed transport (GTP)– Move data between OptIPuter network endpoints
– 10 endpoints across UCSD, UvA, UIC, Pittsburgh– Achieved 17.8Gbps, a TeraBIT in less than one minute
• 3-layer Demo [AHM2005, Jan2005]– Visualization (JuxtaView/LambdaRAM), DVC middleware, high-speed transport– Remote data visualization (visualization: NCMIR; storage: UIC and UvA)
– Use DVC to establish visualization environments
– Automated Grid resource selection and binding– Achieved 2.6 Gbps on ~7 Streams
• 5-layer Demos [iGrid2005, Sep2005]– Applications, visualization, DVC middleware, high-speed transports (GTP),
optical network configuration (PIN/PDC)– Demo #1: Collaborative Data Visualization with Earth-Sciences– Demo #2: Real-time Brain Data Acquisition, Assembly and Analysis
System Software
OptIPuter High-Performance Transport Protocols
• Bridge the Gap between High Speed Link Technologies and Growing Demands of Advanced Applications– TCP has well-documented performance problems on long-haul networks
• Pursue complementary avenues of investigation – Efficient congestion/flow management, fairness among flows
– High-speed group communication (multipoint-to-point, multipoint-to-multipoint)
Private Lambda Shared, Routed
Network Connection
RBUDP/
-stream
GTP SABUL/
UDT
XCP
Unicast ManagedGroup
EnhancedRouters
StandardRouters
CEP
System Software
Composite Endpoint Protocol (CEP)- Accomplishments & Plans
• OptIPuter “Gold Roll” 1 (CEP v. 1.1)– Initial release, basic functionality– 32 Gbps in the LAN– TCP, some automatic tuning
• Software Summit Release (CEP v. 1.2)– New file transfer, sockets API– New internal networking stack
– Preliminary GTP, XIO carrier support– Support for 64-bit systems, more OSes
• OptIPuter “Gold Roll” 2 (CEP v. 2.0)– Target: Fall 2006– Code stabilization– Documentation– Improved scalability– Improved GTP, XIO integration– Improved performance– Suitable for public release
Eric Weigle (UCSD)
System Software
Group Transport Protocol (GTP)- Accomplishments & Plans
• Extend GTP to Sender Capacity Management– End node based allocation schemes at both sources
and sinks to achieve good global performance and fairness
– Proof of stability and convergence properties of GTP• Comprehensive comparison studies between GTP
and other transport protocols• Implementation and Demonstrations with OptIPuter
System Software– iGrid2005 – OptIPuter “Gold Roll” 1 (Basic functionalities)
• 3 Publications
• Analytical studies – Finalize convergence proofs in asynchronous cases– More comparison studies– Study the interaction between GTP and other TCP traffic
• Implementation: OptIPuter “Gold Roll” 2– GTP v. 2.0 – Target: Summer 2006– Goals
– Support both source and sink allocation schemes– Improved CPU efficiency and scalability– Improved CEP, XIO integration– Suitable for public release
Lambda Networks
sources sinks
12
34
56
789
Act
ive
Ses
sio
ns
Ryan Wu (UCSD)
System Software
UDP-based Data Transfer Protocol (UDT)– Accomplishments & Plans
• Based upon experience to date, a new version of UDT is being developed called Composible-UDT
• Composible-UDT supports multiple high-speed congestion control algorithms, including
– UDT decreasing-increases AIMD congestion control– Reliable UDP blast– TCP, HighSpeed TCP, Scalable TCP, BiC, FAST, Vegas, Westwood– GTP
• Recent Experimental Studies– iGrid 2005 - High performance mining of data streams over UDT
– 8 Gb/s computing histograms on web traffic data streams– 14 Gb/s transferring data memory-memory around the world
– iGrid 2005 - Remote exploration of Sloan Digital Sky Survey data– 1.2 Gb/s transfer disk-disk from San Diego to South Korea
• Plan for UDT 3.0 release– Composable-UDT (first full release)– Congestion controlled unreliable messaging– Firewall punching
• Further work on protocol toolkit underlying Composible-UDT• Experiments to understand how to provide secure transport using UDT
Robert L. Grossman & Yunhong Gu (UI-Chicago)
System Software
UDP Offload Engines for LambdaGrids
A Case for UDP-offload engines in LambdaGrids – Venkatram Vishwanath, et al. (PFLDNet 2006) - In collaboration with OSU, LANL, VT)
• Offload UDP Based Protocols to the Network Cards.
• Initial Results– 7.4 Gbps Maximum throughput.– 35% improvement over Host
based UDP– Reduced CPU utilization.– 17% improvement in Latency
• Future Plans– Build LambdaStream on the NIC.– Evaluate Partial Offload Engines
with full offload.
Venkatram Vishwanath et al. (EVL)Venkatram Vishwanath et al. (EVL)
System Software
LambdaStream
• An application-level transport protocol for streaming and data transfer for dedicated high-bandwidth networks.
• Current Status– A single-stream version with a configurable design.– API which supports buffered and unbuffered communications.– Working towards a Multi-stream LambdaStream for Multipoint to
Multipoint communication
• Recent Results– 18 Gbps between Chicago and San Diego over TeraWave.– 18 Gbps between Chicago and San Diego over CaveWave.
• Future Plans– Integrate with SAGE and other applications.– Performance Evaluation with MAGNET to identify end-system
bottlenecks.
Venkatram Vishwanath et al (EVL)
System Software
MAGNET
Joint work with Wu Feng, Mark GardnerLANL and Virginia Tech.
• A monitoring apparatus for generic kernel event tracing.– Identify end system performance bottlenecks in a generic linux kernel
and improve next generation protocols, middleware, and software applications.
• Current status:– Uses the “probes” mechanism in the kernel.– Designed as a kernel module.– Monitor the Network Stack.
• Future Plans:– More instrumentation points.– Performance Analysis of LambdaStream.– Analysis and Synthesis for Adaptive Visualization applications
Venkatram Vishwanath et al. (EVL)
RobuSTore
Storage- Accomplishments & Plans
• RobuSTore Design– RobuSTore Architecture Consisting of Coding Algorithm, Metadata Service,
Admission Controller, and Security Schemes
• RobuSTore Evaluation Across a Wide Range of System Configurations– Evaluate RobuSTore against Conventional Parallel Storage Schemes (i.e. RAID):– Explore Five Dimensions
– # Disks, Data Size, Block Size, Network Latency, Degree of Redundancy 5x Improvement on Robustness; 15x on Access Bandwidth Moderate Overhead: 2~4x Storage Capacity, 1.5x Network, Disk I/O
• Simulation Study with More Complex and Realistic Workloads• RobuSTore Implementation (based on Lustre) and Deployment
– Experiments on the OptIPuter Testbed– Evaluation Using Benchmarks and Neuroscience and Geophysical Application Workloads
Huaxia Xia, Justin Burke (UCSD)
System Software
Vision – Real-Time Tightly Coupled Wide-Area Distributed Computing
Real-Time
Object network
Goals
• High-precision Timings of Critical Actions
• Tight Bounds on Response Times
• Ease of Programming
–High-Level Prog–Top-Down Design
• Ease of Timing Analysis
Dynamically formed
DistributedVirtual
Computer
K. Kim (UCI)
System Software
Real-Time Progress
• RCIM (RT comm infrastructure mgt) – Study of TT Ethernet under way with the help of Hermann Kopetz– Hope to acquire the 1st unit some time in 2006.
• IRDRM (Intra-RT-DVC resource mgt)– TMO (Time-triggered Message-triggered Object) Support Middleware (TMOSM)– Redesigned TMOSM improves modularity, concurrency, portability, and timing
precision. It runs on Linux, WinXP, & WinCE.– Extending the TMOSM to exploit unique capabilities of Jenks’ cluster SPDS2.
• Programming model– API for RT middleware enables high-level RT programming (TMO) without a new
compiler.– The notion of Distance-Aware (DA) TMO, an attractive building-block for RT wide-area
DC applications, was created created and a study for its realization is under way. • Enhancement the Network Infrastructure of OptIPuter
– GPS receivers acquired from German vendor & installed in UCI/UCSD (Calit2 bldgs)– One-way message delay between UCI and UCSD measured
– Jitters were less than 60 microsecs. • Application development experiments
– Preparation of demos; at stage where LAN-based feasibility demos are working.– e.g., Low-jitter video, Fair and efficient Distributed On-Line Game Systems
• Publications in IDPT2003, AINA2004, WORDS2005, ISORC2005, …
Source: Kim, Jenks, et al. at UCI
System Software
Year 4 Plan
• RCIM (RT comm infrastructure mgt) – Development of middleware support for TT Ethernet
• IRDRM (Intra-RT-DVC resource mgt)– Extending the TMOSM to further exploit unique capabilities of Jenks’ cluster SPDS2– Full development of Support for Distance-Aware TMOs– Interfacing TMOSM to the Basic Infrastructure Services of OptIPuter
• Demos– Remote access and control of electron microscopes at UCSD-NCMIR
– Remote control of an electric car
Source: Kim, Jenks, et al. at UCI
Remote Control
node
OptIPuter Paths
Local Relaynode
System Software
Areas of Security Effort
• Trusted remote computation (UCI)– Protection from cheating– Reduce effort on clients, coordinator effort– Solution: inject false positives (“chaff”) as a check
• Broadcast keying (UCI)– Protect messages to large groups– Minimize number and distribution of keys– Solution: chromatic leap-frog keys, predeployed keysets
• Secure network protocols (USC/ISI)– Transport disconnection, spoofing– Alleviate DOS attack impact– Solution: layer different algorithms, SPI-spinning
Joe Touch & Mike Goodrich, UCI/ISI
System Software
IPsec Baseline Performance
0
100
200
300
400
500
600
0 500 1000 1500 2000 2500
3DES + AH
DES + AH
IP + AH MD5
IP + AH SHA1
IP
IP in IP
Packet size
Mbps
Joe Touch & Mike Goodrich, UCI/ISI
System Software
Effect of DOS Traffic
0
20
40
60
80
0 500 1000 1500 2000
DES
DES / bad SPI
DES / bad key
DES / DES
0
20
40
60
80
100
120
140
0 500 1000 1500 2000
MD5
MD5/bad key
MD5/bad SPI
MD5/MD5
Packet size
Mb
ps
• IETF TCPM WG doc– Need for IP-layer
solution
• IETF BTNS WG– Infrastructure-free
security
• IETF Triage session– Reducing load of
spoofed DOS traffic• Goal: Internet
Standard
Joe Touch & Mike Goodrich, UCI/ISI
System Software
Summary of Accomplishments
• Integration and Demonstration of Capability– All five layers (application, visualization, DVC, transport protocols, Optical network control)– Across campus, national and international-scale test beds
• Distributed Virtual Computer– Integrate with network configuration (PIN/PDC) – Simulation study of service models of configurable network– Simulation study of efficient resource selection algorithms
• Advanced Transport Protocols– GTP: Analytic and simulation study, extend to sender capacity management– CEP: Implement and evaluate N-to-M communication; integrate with XIO– SABUL/UDT: Development of composable congestion control algorithms– LambdaStream: Configurable design, API for buffered/unbuffered communication
• Real-Time Programming/Networking/DVC– Time-triggered Message-triggered Object support middleware & API– Enhancement of optiputer network infrastructure with GPS– Evaluation of delay/jitter between UCI/UCSD
• File/Storage Systems– New RobuSTore architecture with Erasure coding, statistical guarantees– Coding algorithm, metadata service, admission controller, security schemes– Evaluation across a wide range of system configurations & conventional schemes
• Performance Analysis/Modelling– Performance Analysis of VolaTile & other OptIPuter software
• Network Security (Touch & Goodrich)
System Software
OptIPuter ROCKS Roll Releases
• OptIPuter System SW Roll (Optigold) v.1 [July2005]– Stable, integrated for OptIPuter iGrid2005 Demos
– OptIPuter software
– DVC Middleware v.1.0
– Core resource, security, namespace and job management services
– Network binding service (interface with PDC)
– DVC configuration and communication APIs– Advanced Transport Protocols
– Group Transport Protocol (GTP) v.0.95
– Composite Endpoint Protocol (CEP) v1.1– Optical Network Configuration
– Photonic Domain Controller (PDC) v.2.0 – External Software
– Globus Toolkit 4.0
• OptIPuter Software Summit 2006 (Jan-Feb, 2006)– Build a Completed, Tested, End-to-end OptIPuter Software Stack – Create Site for Downloadable OptIPuter Software
System Software
Summary of Plans
• Lots of progress in 2005- Met and exceeded goals!• Planned Features & Improvements
– Distributed Virtual Computer (DVC)– Second-gen architecture, prototypes, improved cross-team integration and resource
selection– Real-Time (TMO)
– RT communication infrastructure, Intra-RT DVC resource mgmt, applications (NCMIR) – Performance Analysis (Prophesy)
– Develop and utilize archive for performance data– High Speed Protocols (CEP, LambdaStream, XCP, GTP, UDT)
– CEP: version 2.0: code stabilization, integration, performance, documentation– GTP: version 2.0: convergence, comparison studies, integration with other protocols– UDT version 3.0: composible, congestion controlled, firewall punching, security– LamdaStream: UDP Offload, integration with SAGE & apps, bottleneck identification
– Storage (Robustore)– Simulation/evaluation with more complex/realistic workloads– implementation and deployment (lustre, optiputer testbed)
– Security– Explore IPsec variants to reduce attacker advantage
• Planned 2006 releases– ~September 2006 – System Software Gold Roll version 2– ~November 2006 – cross-team 5-level demonstrations (SC2006)