application driven datacenter computing

17
Applica’onDriven Datacenter Compu’ng Shiding Lin EDCS-HPCA, Shenzhen 2013/2/24

Upload: fei-dong

Post on 05-Dec-2014

616 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Application Driven Datacenter Computing

Applica'on-­‐Driven    Datacenter  Compu'ng  

Shiding Lin EDCS-HPCA, Shenzhen

2013/2/24

Page 2: Application Driven Datacenter Computing

Let’s Start from the Search Engine…

Central  Repository  of  

Web  Pages  

Inverted  Index  

Web  Pages  

Data  Mining�

Index  Building�

Web �

Page 3: Application Driven Datacenter Computing

To Build a High-Throughput Storage System

Stream Block0

Block 1

Block X

In-Memory Records {<key, data>}

Log Block-N

Log Block-1 New stream

Block 0

Block 1

Block Y

Memory

Disk

Update Query

Dump

Commit

Log-­‐based  Structure            Block  I/O          Batch  Commit          Stream  R/W  

Page 4: Application Driven Datacenter Computing

Block

Block @disk0

Block @disk1

Block @diskN

<key, data>

Memory Disk

dump

Block …

A  Big  Virtual  File  by  Blocks  

To Build a High-Throughput Storage System

Maximize  Parallelism            NO  RAID,  Raw  Disk          Direct  I/O          Independent  of  FS  

Page 5: Application Driven Datacenter Computing

3-Layer Architecture of a Typical Storage System

Block  

Base  Stream   Mod  Stream  

Table  

Block   Block   Block   Block   Block  …  

Index  Stream   Patch  Stream  

Page 6: Application Driven Datacenter Computing

To Make It Large-Scale

Which  Layer  to  ParNNon,  and  the  ReplicaNon  Granularity?            Complexity          Data  Exchange  Traffic          Reliability  

Page 7: Application Driven Datacenter Computing

Replication Scheme 1

3x  Commit  Cost  Local  I/O  Only  

Block  

Base  Stream  

Mod  Stream  

Table  

…   …   Block  

Index  Stream  

Patch  Stream  

Replica  1  

Block  

Base  Stream  

Mod  Stream  

Table  

…   …   Block  

Index  Stream  

Patch  Stream  

Replica  2  

Block  

Base  Stream  

Mod  Stream  

Table  

…   …   Block  

Index  Stream  

Patch  Stream  

Replica  3  

Page 8: Application Driven Datacenter Computing

Replication Scheme 2

1x  Commit  Cost  Network  &  Disk  I/O  

Base  Stream  

Block   …  

Replica  1  

Block   …  

Replica  2  

Block   …  

Replica  3  

Mod  Stream  

Block   …  

Replica  1  

Block   …  

Replica  2  

Block   …  

Replica  3  

Index  Stream  

Block   …  

Replica  1  

Block   …  

Replica  2  

Block   …  

Replica  3  

Patch  Stream  

Block   …  

Replica  1  

Block   …  

Replica  2  

Block   …  

Replica  3  

Page 9: Application Driven Datacenter Computing

Map to Physical Architecture

Logical  Layer          Table          Stream          Block  

Physical  Boundary          Datacenter          Cluster          Rack          Node  

Physical  Layer          Memory          Flash          Disk  

Page 10: Application Driven Datacenter Computing

What Are Changed?

Single-­‐User  MulN-­‐Task  à  MulN-­‐User  Single-­‐Task    Scale  &  Cost    Speed  of  Delivery  

Page 11: Application Driven Datacenter Computing

Software Architecture Principles in Datacenter

Layered  à  VerNcal      Out-­‐of-­‐the-­‐Box          Datacenter  as  a  Computer          To  Tolerate  Component  Failure  

Page 12: Application Driven Datacenter Computing

Hardware Architecture Principles in Datacenter

Dummy          Control  Logic  Goes  SoXware          ReplicaNon/Checksum/Buffer  Goes  Global    Programmable          Expose  All  Interfaces          Collect  All  Data  

Page 13: Application Driven Datacenter Computing

Hardware Architecture Principles in Datacenter

Modularized  and  Configurable    Reduce  All  the  Unnecessary    Share  All  the  Possible  

Page 14: Application Driven Datacenter Computing

Practice 1: Baidu SSD

Raw  Channels  No  Shadow  Buffer  No  Wear  Leveling  

Page 15: Application Driven Datacenter Computing

Practice 2: Smart Disk Replacement

Failure  and  Repair  Logs  

Failure  Model  

Predict  Failure  Reduce  False-­‐Alarm  

Page 16: Application Driven Datacenter Computing

Practice 3: ARM Server

2U,  6  Nodes,  12  HDD/U  Internal  Network  Switch  

Page 17: Application Driven Datacenter Computing