presentation gordon
TRANSCRIPT
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive
Applications
Presenter: He Wang
Department of Electrical and Computer Engineering
University of Florida
Outline
• Motivation and Background
• Introduction to Gordon’s system architecture
• Gordon’s storage system
• Configuring Gordon
Wiki
• Gordono A flash-based system architecture for massively parallel,
data-centric computing
• Featureo Power efficiency
o Performance advantage
o Aimed at data-centric applications
Motivation and Background
• Challenges with large-scale data processingo Slowdown in uni-processor performance
o Latency and BW bottleneck of HDD
o Power constraints
• Improve performance and power efficiency
• Progresseso Programming model that parallelizing data-processing program
o Increased BW and reduced latency with SSD
o Recent power efficient processors
Motivation and Background(cont)
• Gordon o Programming system that parallelizing data-processing program(i.e.
MapReduce)
• Abstractions for specifying data-parallel compution
• Automating the parallelism
o SSD
• Improved flash translation layer(FTL)
o Power efficient processors
• 100s or 1000s
• simple interconnect
Gordon system architecture
• Gordon nodeso 256GB Flash mem, flash storage controller, 2GB SDRAM,
1.9Ghz Intel Atom processor
o Connected through 1Gb ethernet-style network
o A standard rack hols 16 enclosures for 256 nodes with 64TB storage and 230GB/s I/O BW
o Independent computer
• OS
• Network interfaces
Gordon system architecture
• Gordon nodes featureso Power efficient
• 19W to 81W
o High BW
• 900MB/S
Figure 1. Gordon system architecture
Storage system
• Key to power efficiency and performance
• Support Erase, Program, Read operations
• Reliability issueo Wear out, needs wear-leveling
• Flash translation layer(FTL)
Storage system
• Flash controllero Implements FTL
o Link between CPU and flash array
• Shared buses, up to 4 packages
Storage system
• Gordon FTLo Operate a write point
• Pointer to a page of flash memory
o Maintain a summary page in each block• Logical block address(LBA)-to-physical mapping
• Benefit of this indirection
• Address organization
• Wear-leveling
• Working flowo Receive write command
o Locate data by write point
o update LBA table
Storage system
• Gordon FTL advantage---Write pointo Original FTL has only one write point, no parrallel
o Multiple write points with spread access
o Sequence number
• Avoid conflict with occupied write point
• Assign the write point with smallest available
Storage system
• Gordon FTL advantage---super-pageo Manage flash array with larger granularity with one write
point for each
o Horizontal striping
o Vertical striping
o 2D striping
Storage system• Super-page stripping approaches
Figure 2. Three approachs to striping data across flash arrays
Storage system
• Super-page
o Pros
• Reduced overhead
o Cons
• Latency for sub-page access
• Wear out effect larger portion
Configuring Gordon• Workloads
o Benchmarks that use MapReduce
• Power modelo Direct mesure of a running system
o Datasheet
P = IdlePower * (1-ActivityFactor) + ActivePower * ActivityFactor
Configuring Gordon• Measuring cluster performance
o High-level simulator to measure overall performance
• Model 32 node by running 4 Vmware on 8 servers
o Sync mode, provides upper bound of exe time
o nosync mode, provides lower bound
o Storage simulator
Configuring Gordon• Parato-optimal Gordon system design
Figure 6. Parato-optimal Gordon system designs
Configuring Gordon• Optimal Gordon configurations
Figure 5. Optimal Gordon configuration
Out-perform disk-based by 1.5X and deliver 2.5X more performance per watt
Configuring Gordon• Gordon power consumption
o MaxE-flash consumes 40% of the energy of the disk-based configuration
o A factor of two increase in performance
Figure 6. Relative energy consumption