click to edit master title style literature review interconnection architectures for petabye-scale...
DESCRIPTION
Proposal Follow high-performance computing evolution –Multi-processor networks Network of commodity devices Use disk + 4 12port 1GbE switch as building block Explore & simulate interconnect topologiesTRANSCRIPT
Click to edit Master title style Literature Review
Interconnection Architectures for Petabye-Scale High-Performance Storage Systems
Andy D. Hospodor, Ethan L. MillerIEEE/NASA Goddard Conference on Mass Storage Systems and Technologies
April 2004
Henry ChenSeptember 24, 2010
Introduction
High-performance storage systems– Petabytes (250 bytes) of data storage– Supply hundreds or thousands of compute nodes– Aggregate system bandwidth >100GB/s
Performance should scale with capacity
Large individual storage systems– Require high-speed network interface– Concentration reduces fault tolerance
Proposal
Follow high-performance computing evolution– Multi-processor networks
Network of commodity devices
Use disk + 412port 1GbE switch as building block
Explore & simulate interconnect topologies
Commodity Hardware
Network– 1Gb Ethernet: ~$20 per port– 10Gb Ethernet: ~$5000 per port (25x per Gb per port)
● Aside: Now ~$1000 per port
Disk drive– ATA/(SATA)– FibreChannel/SCSI/(SAS)
Setup
Target 100GB/s bandwidth
Build system using 250GB drives (2004)– 4096 drives to reach 1PB– Assume each drive has 25MB/s throughput
1Gb link supports 23 disks
10Gb link supports ~25 disks
Basic Interconnection
32 disks/switch
Replicate system 128x– 4096 1Gb ports– 128 10Gb ports
~Networked RAID0
Data local to each server
Fat Tree
4096 1Gb ports
2418 10Gb ports– 2048 switch to router
(128 Sw × 8 Rt × 2)– 112 inter-router– 256 server to router (×2)
Need large, multi-stage routers
~$10M for 10Gb ports
Butterfly Network
Need “concentrator” switch layer
Each network level carries entire traffic load
Only one path between any two server and storage
Mesh
Routers to servers atmesh edges
16384 1Gb links
Routers only atedges; mesh providespath redundancy
Torus
Mesh with edgeswrapped around
Reduces average pathlength
No edges; dedicatedconnection breakoutto servers
Hypercube
Special-case torus
Bandwidth scalesbetter than mesh/torus
Connections per nodeincreases with system
Can group devices intosmaller units and connectwith torus
Bandwidth Not all topologies actually capable of 100GB/s
Maximum simultaneous bandwidthLink speed × number of links
Average hops
Analysis
Embedding switches in storage fabric uses fewer high-speed ports, but more low-speed ports
Router Placement in Cube-Styles Routers require nearly 100% bandwidth of links
Adjacent routers cause overload & underload
Use random placement; optimization possible?
Conclusions
Build multiprocessor-style network for storage
Commodity-based storage fabrics can be used to improve reliability and performance; scalable
Rely on large number of lower-speed links; limited number of high-speed links where necessary
Higher-dimension torii (4-D, 5-D) provides reasonable solution for 100GB/s from 1PB