lca13: jason taylor keynote - arm & disaggregated rack - lca13-hong - 6 march 2013

34

Upload: linaro

Post on 09-May-2015

615 views

Category:

Technology


1 download

DESCRIPTION

Resource: LCA13 Name: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013 Date: 06-03-2013 Speaker: Jason Taylor

TRANSCRIPT

Page 1: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013
Page 2: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

ARM & Disaggregated Rack: Facebook’s approach to smaller processors

Jason Taylor, PhD

Director, Capacity Engineering & Analysis

Page 3: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Agenda

1 Facebook Scale & Infrastructure

2 Mobile Processors

3 Disaggregated Rack

Page 4: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

82

%

of users are

outside of

the U.S

4 domestic regions today. Europe region will come online later this year.

Facebook Scale

Page 5: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Facebook Stats

• 1 billion users

• 350+ million photos added per day

• 4.2 billion likes, posts and comments per day

• 140+ billion friend connections

• 240+ billion photos

• 17 billion check-ins

Page 6: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Cost and Efficiency

•From our 10-Q filed with the SEC in October 2012:

• “The first nine months of 2012 ... $1.0 billion for capital expenditures

related to the purchase of servers, networking equipment, storage

infrastructure, and the construction of data centers.”

•At this size, we spend a lot of time thinking about efficiency

and costs.

Page 7: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Architecture

Service Cluster Back-End Cluster

Front-End Cluster

Web 250 racks

Ads 30 racks

Cache (~144TB)

Search Photos Msg Others UDB ADS-DB Tao Leader

Multifeed 9 racks

Other small services

Page 8: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Lots of “vanity free” servers.

Page 9: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Multifeed rack

• The rack is our unit of capacity

• All 40 servers work together

• Leaf + agg code runs on all servers

• Leaf has most of the the RAM

• Aggregator uses most of the CPU

• Lots of network BW within the rack

Leaf Aggregator

A L

A L

A L

.

.

.

.

Page 10: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Front end Back

end

Life of a “hit” Front-End Back-End

Web

MC

MC

MC

MC

Ads

Database

L

Feed agg

request starts

Time

request completes

L L L L L

Page 11: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Standard

Systems

I

Web

III

Database

IV

Hadoop

V

Haystack

VI

Feed

CPU High

2 x E5-2670 Med

2 x X5650 Low

1 x L5630 High

2 x E5-2660

Memory Low

16GB High

144GB Medium

48GB Low

18GB High

144GB

Disk Low

250GB High IOPS

3.2 TB Flash High

12 x 3TB SATA High

12 x 3TB SATA Medium 2TB SATA

Services Web, Chat Database Hadoop Photos, Video Multifeed,

Search, Ads

Five Standard Servers

Page 12: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Five Server Types

Advantages:

• Volume pricing

• Re-purposing

• Easier operations - simpler repairs, drivers, DC headcount

• New servers allocated in hours rather than months

Drawbacks:

• 40 major services; 200 minor ones - not all fit perfectly

• Service needs change over time.

Page 13: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Agenda

1 Facebook Scale & Infrastructure

2 Mobile Processors

3 Disaggregated Rack

Page 14: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Server Processors

• Servers in datacenters use processors that were designed for desktop

computers.

• Intel and AMD have dominated this market with big x86 processors.

Page 15: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Mobile Processors

• Smaller processors for smart phones will pass two criteria by 2014:

• 64 bit instructions

•High clock speed - ~2.4 GHz

• It is now reasonable to consider ARM, Atom and even MIPS processors

for big compute jobs.

Page 16: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Compute Power

Page 17: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Cores Required

Page 18: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Watts Required

Page 19: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

The Problem

• Big processors provide a cost advantage by amortizing fixed costs in the

servers.

• If all other costs remain the same then wimpy cores (ARM, MIPS, Atom)

will effectively triple the price of fixed resources:

•Rack, chassis, disk, RAM, NIC, etc.

Page 20: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Our Solution: Group Hug

•Facebook is driving a solutions through the Open Compute initiative:

•Group Hug server board:

•Allows up to 10 individual compute boards.

•Single Processor PCIE-like cards

•A 1GB interfaces mux’ed up to a 10GB NIC

•No drives, flash, or prehephrials

• ==> 3 to 5x the processors compared to a dual-socket system

• ==> About the same throughput and power.

Page 21: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Agenda

1 Facebook Scale & Infrastructure

2 Mobile Processors

3 Disaggregated Rack

Page 22: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Disaggregated Rack Challenge

•Can we build hardware that will fit more services and still do

well in terms of serviceability and cost?

•Can we build hardware that will grow with services over time?

•What might it look like to support Group Hug?

Page 23: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Server/Service Fit - across services

TYPE-6 server

CPU

Other Service A

RAM

MultiFeed

CPU

RAM

WASTED CPU RESOURCE

TYPE-6 server

Page 24: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Server/Service Fit - over time

TYPE-6 server

CPU

Year 2 - more CPU needed

RAM

Year 1

CPU

RAM

NOT ENOUGH CPU

TYPE-6 server

Page 25: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Building blocks:

• CPU

• RAM (key/value pairs)

• Disk IOPS

• Disk space

• Flash IOPS

• Flash space

Common resource pairs:

• CPU vs RAM

• RAM vs Disk IOPS

• RAM vs Flash IOPS

Growth resources:

• RAM

• Disk space

• Flash space

In-Rack Resources

Page 26: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Disaggregated Rack

How can we build hardware that is highly configurable

and re-configurable but still cost effective?

Page 27: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

A rack of multifeed servers...

COMPUTE

RAM

STORAGE

Type-6 Server

Network Switch

Type-6 Server

Type-6 Server

Type-6 Server

=>

40 Feed servers per rack each server with: 2 x E5-2660 144GB RAM 2TB hard drives 760GB of flash * We assume full line-rate network within the rack.

5.8 TB

80 TB

.

.

.

FLASH 30 TB

Type-6 Server

80 processors 640 cores

Page 28: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Compute • Standard Server

• 2 processors

• 8 or 16 DIMM slots

• no hard drive - small flash boot

partition.

• big NIC - 10 Gbps or more

• Group Hug

• 10 individual single-proc servers

• A few DIMMS

• no hard drive - small flash boot

partition.

• smaller NICs to 10 GBps

Page 29: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Ram Sled

•Hardware

• 128GB to 512GB

• compute: FPGA, ASIC, mobile processor or desktop processor

•Performance

• 450k to 1 million key/value gets/sec

•Cost

• Excluding RAM cost: $500 to $700 or a few dollars per GB

Page 30: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Storage Sled (Knox)

•Hardware

• 15 drives

• Replace SAS expander w/ small server

•Performance

• 3k IOPS

•Cost

• Excluding drives: $500 to $700 or less

than $0.01 per GB

Page 31: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Flash Sled

•Hardware

• 175GB to 18TB of flash

•Performance

• 600k IOPS

•Cost

• Excluding flash cost: $500 to

$700

NIC at 70%

utilization IOPS Capacity

1 Gbps 21k 175 GB

10 Gb 210k 1.75 TB

25 Gb 525k 4.4 TB

40 Gb 840k 7.7 TB

50 Gb 1.05M 8.8 TB

100 Gb 2.1M 17.5 TB

Page 32: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

A disaggregated rack for graph search...

Compute

Network Switch

Compute

Storage Sled

RAM Sled

=>

.

.

Flash Sled

.

.

COMPUTE

RAM

STORAGE

3.1 TB

60 TB

FLASH 30 TB

40 processors 320 cores

20 Compute Servers 8 Flash Sleds 2 RAM Sleds 1 Storage Sled => 1:10 RAM:Flash ratio * Add 4 more flash sleds in 2014 to get to a 1:15 RAM:Flash ratio *

Facebook Colors

Facebook related Charts

and Graphic Colors

Accent colors for callouts or

background shapes

Accent Colors

Page 33: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Disaggregated Rack

Strengths:

• Volume pricing, serviceability, etc.

• Custom Configurations

• Hardware evolves with service

• Smarter Technology Refreshes

• Speed of Innovation

Potential issues:

• Physical changes required

• Interface overhead

Page 34: LCA13: Jason Taylor Keynote - ARM & Disaggregated Rack - LCA13-Hong - 6 March 2013

Questions?