the future of innovation in computing.pdf

Upload: evan

Post on 23-Feb-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 The Future of Innovation in Computing.pdf

    1/24

    Copyright IBM Corporation 2014

    The Future of Innovationin Computing

    Jeff Stuecheli

    Hardware Architect

    IBM

  • 7/24/2019 The Future of Innovation in Computing.pdf

    2/24

    POWER1 1990

    Execution BW

    6 x 107FLOPS

    Storage BW

    240 MB/sec to L1

    144 MB/sec to 1 GB DRAM

    POWER7 QCM 2011

    Execution BW

    1 x 109 FLOPS

    Storage BW

    6 TB/sec to L1

    6 TB/sec to L2 3 TB/sec to L3

    400 GB/sec1 TB DRAM

    16000x

    25000x

    2700x

    The last 20 years

    2

  • 7/24/2019 The Future of Innovation in Computing.pdf

    3/24

    Innovations in those 20 years

    1.0 m 45 nm feature size

    ~500x the transistor density

    25 MHz 4 GHz clock rates

    133x the clock rate (enabled by faster gates And deeper

    pipelines)

    25 MHz 6.4 GHz busses

    High frequency communication

    3

  • 7/24/2019 The Future of Innovation in Computing.pdf

    4/24

    The next 20 years

    At the same rate (16k x), in 20 years 44 Watsons would fit in 1U of rack space!

    But,

    Gates would be 125 pm ( < 1 Si atom wide )

    Voltage scaling limits

    Prior power increase offset by lower voltage operation (Dennard

    scaling)

    4

  • 7/24/2019 The Future of Innovation in Computing.pdf

    5/24

    What will the future bring?

    For computer architects, its likely more exciting than the last 20 years

    Vision presented in this talk 16k x is possible! More diverse innovations

    More gates without smaller devices (cheaply manufactured)

    3D structures

    Power reduction through more efficient gate utilization

    Gate leakage (Power gating)

    Integration

    Sophisticated power management/voltage control

    Reconfigurable logic

    Higher power density through advancement in packaging and cooling technologies

    Liquid replaces air cooling

    Energy recovery through reuse

    System integration enables higher bandwidth at reduced energy

    Si interposer based communication Optics

    5

  • 7/24/2019 The Future of Innovation in Computing.pdf

    6/24

    IBM Stack

    Research

    And

    Innovation

    IBM

    Google

    NVIDIA

    TYAN

    Mellanox

    OpenPower

    Open Innovation

    What is OpenPOWER?

    Industry Consortium focused on Innovation

    - Across Server HW / SW stack

    - For customized servers and components

    - Leveraging complementary skills and investments

    - To provide differentiated architectural alternatives

    Benefits for Clients

    New Innovators on Power Platform = More Value

    OpenPOWER = Greater choice for IBM Clients

    More Innovation = Increased Adoption of Power

    OpenPOWER: The Beginning

  • 7/24/2019 The Future of Innovation in Computing.pdf

    7/24

    Boards / Systems

    I/O / Storage / Acceleration

    Chip / SOC

    System / Software / Services

    Implementation / HPC / Research

    OpenPOWER: Today

  • 7/24/2019 The Future of Innovation in Computing.pdf

    8/24

    Growing transistors without process shrinks

    Current industry expectation is ~6nm in 2026 (International Technology Roadmap for Semiconductors)

    Moores law would predict 0.25nm

    Density doubles every 4 years instead of 2 years

    How can we achieve more ~gates without smaller transistors

    Every 4 years we need some other doubling improvement to stay on 2year growth rate

    Todays example: eDRAM

    eDRAM reduces both transistors and energy for semiconductor arrays

    ~equivalent to a new process generation

    Beyond more gates, more useful gates

    Remove gates through integration, optimization

    8

  • 7/24/2019 The Future of Innovation in Computing.pdf

    9/24

    3d Stacking

    Many-levels of chips with

    low power and latency

    communication

    Enables larger caches

    stacked below the CPU

    Enables larger chips with

    good yield

    DRAM TSV enables larger

    capacity without power and

    frequency cuts

    9

  • 7/24/2019 The Future of Innovation in Computing.pdf

    10/24

    Gates in 3D are better than 2D

    CPU design example

    Todays 2D designs Larger structures introduce longer physical

    distances creating a design conflict

    Example, multi level design structures

    TLB: Translation Lookaside Buffer,

    POWER7 design uses a two levelstructure. Second tier is outside critical

    logic path, but adds area, making other

    structures father apart.

    Data/Instruction caches: Tertiary levels

    inherently forced to be across adjacent

    (vs inside the CPU core). This resulting

    in wide high power busses crossing

    large distances.

    10

    Core

    L2 Cache

    Core

    L2 Cache

    L2 Cache L2 Cache

    Mem Ctrl L3 Cache and Chi

    LocalSMPLinks

    Remo

    teSMP

    Fast Local

    L3 Region

  • 7/24/2019 The Future of Innovation in Computing.pdf

    11/24

    Gates in 3D advantage

    Critical high power execution core takes up the penthouse

    (where heat can more easily be removed)

    Smaller core yields higher frequency and reduced energy

    Second level structures pulled under the core

    Can grow to ideal size without hurting critical execution loop.

    L3 cache pulled to 3rdlevel

    4thlevel can be power control, IO transceivers, more cache

    11

  • 7/24/2019 The Future of Innovation in Computing.pdf

    12/24

    Staging current CPUs into many layers

    Key limiter is available vertical wiring channels

    1stgeneration : Pull external interface logic and voltage

    regulation into lower layer

    2ndgeneration : Add an L3 cache layer

    3rdgeneration : Move L2 cache and large second level core

    centric structures (TLB, Predictors)

    4thgeneration : Two layer CPU, execution units on top, L1

    caches below

    12

  • 7/24/2019 The Future of Innovation in Computing.pdf

    13/24

    Efficient integration with Si Interposers

    Use old manufacturing line to produce large

    active Si interconnect (base layer of 3D)

    Enables efficient communication between CPU

    compute stacks, memory stacks, Accelerators, and

    optical transceivers.

    MCM (Multi-chip-module), where module is active

    Si logic.

    Enables very high bandwidth/low power

    interconnect

    Conceptually, system could fit on the Si carrier,with optical external attach points

    Much lower power interconnect

    Micro bumps

    13

  • 7/24/2019 The Future of Innovation in Computing.pdf

    14/24

    Si interposer communication advantages

    On vs Off chip communication

    On chip

    Bucket brigade

    Clock skew managed along path

    Wire pitch ~10s nm

    Off chip

    Wave pulses along a string

    Clock skew managed at endpoint

    Wire pitch ~10s m

    14

  • 7/24/2019 The Future of Innovation in Computing.pdf

    15/24

    Circuit based energy improvements

    Power gating: Turn off voltage to prevent gateleakage

    Utilized in Power7+ core (entire core as one

    domain)

    Multi-cycle transition

    Current server class design gate large blocks

    (entire CPU)

    Required to provide voltage stability through

    capacitance in power grid

    Fine grain power gating will become possible in

    server space with sophisticated 3D based

    power delivery.

    Potential ~4x reduction in leakage power.

    15

  • 7/24/2019 The Future of Innovation in Computing.pdf

    16/24

    19 IBM Research Zurich

    Scalable Heat Removal by Interlayer Cooling

    3D integration requires interlayer cooling for stacked logic

    chips Bonding scheme to isolate electrical interconnects from

    coolant

    Through silicon via electrical bonding

    and water insulation scheme

    A large fraction of energyin computers is spent for

    data transport

    Shrinking computers

    saves energyTest vehicle with fluid

    manifold andconnection

    Microchannel

    Pin fin

  • 7/24/2019 The Future of Innovation in Computing.pdf

    17/24

    Future Memory

    Today

    DRAM ~100ns, Read and Write Durable, Volatile

    Technology scaling slowdown

    FLASH ~100us, Read Durable

    Disk ~10ms, Read and Write Durable

    Tomorrow

    Phase Change Memory (PCM)

    Restive RAM (RRAM)

    ~100ns Read

    ~1 usec write Read Durable

    Non-volatile

    17 Copyright IBM Corporation 2011

  • 7/24/2019 The Future of Innovation in Computing.pdf

    18/24

    18 Copyright IBM Corporation 2013

    Disruptive Optics Evolutionand Silicon Photonics

    Silicon Photonics,

    Multi-wavelength,

    25 Gb/s Optics

    0 11 0 01 1 1

    0 1 1 0 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1

    3 A 6 F 2 9 7 0 B C 5 3 E 5 1 4 D 8 9 F

    10 Gb/s = 24 GB/s

    1 Color (Deuce)

    25 Gb/s = 60 GB/s

    1 Color (Deuce)

    25 Gb/s = 240 GB/s

    4 Color (Deuce)

    Optical Interconnects

    POWER7 775 HPCsystem

    High density IO off

    module optical

    transceivers

    Physical escape density

    P7 775 HPC network chip

    shown below

  • 7/24/2019 The Future of Innovation in Computing.pdf

    19/24

    Heterogeneous Computing

    ASIC- An application-specificintegrated circuit(ASIC) is

    an integrated circuit (IC) customized fora particular use, rather than intended forgeneral-purpose use. For example, achip designed solely to run specific cellphone is an ASIC.

    FPGA- A field-programmable gate

    array(FPGA) is an integratedcircuit designed to be configured by thecustomer or designer aftermanufacturinghence "field-programmable

    GP GPUA General Purpose

    Graphics Processing Unitis amassively threaded processing engine

    capable of accelerating highly parallel

    computation programs using many

    very lightweight threads.

    19

  • 7/24/2019 The Future of Innovation in Computing.pdf

    20/24

    FPGA Trends

    This 3x rise in LE density occurred

    as technology shifted from lagging

    edge (180nm) to leading edge

    (40nm). This brings new speedsand capabilities, lower costs (stillpreserving >60% margins), while

    ASIC costs are expected to rise

    exponentially.

    => This has primed a tipping in

    the industry.

    FPGACapability

    Cost per LE (MID-RANGE)

    Current

    Field

    DeployedAppliances

    High End FPGA Price to Logic Ratio (100Ku/yr.)

    note: DSP, memory blocks, hard ip (e.g. PCIe) added over time

    74.4

    49.635.631.3

    26.0

    14.0

    9.06.6

    6.2 5.33.8

    3.0 2.8 2.52.0

    1.5 1.4 1.2 1.10.9 0.8

    0.7

    0.1

    1.0

    10.0

    100.0

    1993

    1994

    1995

    1996

    1997

    1998

    1999

    2000

    2001

    2002

    2003

    2004

    2005

    2006

    2007

    2008

    2009

    2010

    2011

    2012

    2013

    2014

    $/KL

    E's

    WFO

    PoC /

    Datapower

    20

  • 7/24/2019 The Future of Innovation in Computing.pdf

    21/24

    CPU vs ASIC vs FPGA efficiency

    Custom logic

    4 Ghz Highly optimized for one task

    At time of fabrication

    GenericLogic 250 MHz

    Configurable for ~any task

    Change at any time

    21

    Pervasive

    Instruction

    Fetch and Decode

    Instruction

    Sequencing

    DecimalUnit

    Vector

    and

    Scalar

    Unit

    Load/Store

    Unit

    Fixed

    Point

    Unit

  • 7/24/2019 The Future of Innovation in Computing.pdf

    22/24

    FPGAs and Workload Optimized Systems

    Big data has inherent data parallel components

    Data compression

    Algorithms in logic 10-100x more efficient than CPU based

    Negligible latency

    Increases effective disk capacity Increases effective disk and network bandwidth

    FPGA logic can be used to sift large volumes of data, which is then passed to

    the CPUs for detailed analysis

    Packaged solutions hide FPGA programing complexity from user

    Workload optimized appliance delivery model

    22

  • 7/24/2019 The Future of Innovation in Computing.pdf

    23/24

    Software challenges

    All of the following apply to,

    Applications

    Middleware: Compilers, database, etc.

    System SW: OS, hypervisors, cluster, etc.

    Parallel programing

    Accelerator usage (heterogeneous computing)

    Workload partitioning

    FPGA compilation

    Tiered memory management

    More levels, diverse types Melding of main memory and storage

    EDA tools required to support complex design structures and circuit power

    optimization (e.g. productive fine grain power gating, diffraction mask generation,

    etc.)

    23

  • 7/24/2019 The Future of Innovation in Computing.pdf

    24/24

    IBM as the Innovator

    Only company with resources to design such an integrated

    systems

    World leading

    Technology

    Research labs

    Hardware design

    Software design

    System design