sparc64™ xii: fujitsu’s latest 12 core processor … 8 instructions, decode 4 instructions,...

25
All Rights Reserved, Copyright© FUJITSU LIMITED 2017 SPARC64 XII: Fujitsu’s latest 12 Core Processor for Mission Critical Servers April 20, 2017 Takumi Maruyama Advanced System Research & Development Unit Fujitsu Limited

Upload: duongtuyen

Post on 21-Mar-2019

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

All Rights Reserved, Copyright© FUJITSU LIMITED 2017

SPARC64™ XII: Fujitsu’s latest 12 Core Processor

for Mission Critical Servers

April 20, 2017

Takumi Maruyama

Advanced System Research & Development Unit Fujitsu Limited

Page 2: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 1

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 3: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

Fujitsu Processor development Perpetual evolution

2000~2003

SPARC64

SPARC64 II

SPARC64 V

SPARC64 GP

GS8900

GS21 600

GS8600

GS8800B

SPARC64 VII

GS21 1600

SPARC64 V+

SPARC64 VI

GS8800

GS21 900 Mainframe

Performance

Reliability

Store Ahead Branch History Prefetch

Single-chip CPU

Non-Blocking $ O-O-O Execution Super-Scalar

L2$ on Die

HPC-ACE System on Chip Hardware Barrier Multi-core Multi-thread

2004~2007 2008~2011

SPARC64 GP

2012~2015 2016~

SPARC64 IXfx

SPARC64 VIIIfx

Virtual Machine Architecture Software on Chip High-speed Interconnect

130nm

250nm / 220nm

180nm

:Technology generation

90nm

350nm

28nm

65nm

HPC

$ ECC Register/ALU Parity Instruction Retry $ Dynamic Degradation Error Checkers/History

Mainframe/UNIX/HPC incremental development

GS21 2600

45nm

40nm

SPARC64 XIfx

SPARC64 X

20nm

SPARC64 XII

SPARC64 X+

UNIX

2

Fujitsu SPARC M12 server

Fujitsu M10 server

Page 4: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 3

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 5: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 4

SPARC64™ XII Design Concept High Performance: SPEED and THROUGHPUT High SPEED (Single thread performance)

High CPU frequency State of the art O-O-O Rich execution units

High THROUGHPUT

Many cores, threads Strong Cache and Memory

Robust RAS

0

100

200

300

400

500

0 1 2 3 4 5

Thr

ough

put [

GIP

S]

Speed [GHz]

SPARC64 performance SPARC64TM XII

SPARC64TM X+ Multi-core Multi-thread

Page 6: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 5

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 7: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 6

SPARC64™ XII Chip Overview Architecture Features

• 12 cores x 8 threads • SWoC (Software on Chip) • 32MB L3 cache • Embedded Memory and IO

Controller (PCIe GEN3)

20nm CMOS • 25.8mm x 30.8mm • 5,450M transistors • 1,860 signal pins • 4.25GHz (up to 4.35GHz with

High Speed Mode enabled)

Performance (peak) • 417GIPS / 835GFlops • 153GB/s memory throughput

DDR4 interface

DDR4 interface

Core Core

L2 C

ache

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

L3 C

ache

MAC

MAC

SERDES

PCIe Gen3

SERDES

Inter connect

L3 C

ache

L3 C

ache

L3 C

ache

L2

Cac

he

L2 C

ache

L2

Cac

he

L2 C

ache

L2

Cac

he

L2 C

ache

L2 C

ache

L2 C

ache

L2 C

ache

L2 C

ache

L2 C

ache

Inte

rcon

nect

& C

oher

ence

Con

trol

Page 8: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 7

Instruction Set Architecture

SPARC-V9/JPS HPC-ACE VM / SWoC

Integer Execution Units

(156 GPR x 4 + 96 GUB) x 2pipe (ALU/SHIFT x2) x 2pipe (ALU/AGEN x2) x 2pipe (MULT/DIVIDE x1) x 2pipe

FP Execution Units

(128 FPR x 4 + 64 FUB) x 2pipe (FMA x4 / FDIV x2) x 2pipe (IMA/Logic x4) x2pipe (Decimal x1 / Cypher x2) x 2pipe

L1$ L1I$ 64KB/4way L1D$ 32KB/4way x 2pipe

L2$ 512KB/16way

SPARC64™ XII Core spec

Page 9: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

L1 Instruction

Cache

64KB

RSE Reservation Station

for Execution

RSA Reservation Station

for Address generation

RSF Reservation Station

for Floating-point

RSBR Reservation Station

for Branch

GUB

EXA

EXB

EAGA

EAGB

FUB FPR Update Buffer

FLA

FLB

Fetch Port

Store Port

L1 Data

Cache

32KB

Fetch Decode Issue Reg-Read Execute Memory Access

Commit Stack Entry

Commit

FLC

FLD

Store Buffer

12 cores

Pipeline-0

Pipeline-1

MAC

L3 Cache

IOC CPU-CPU i/f

L2 Cache

dTLB

SPARC64TM XII Pipeline

Branch Prediction

Enhanced in SPARC64TM XII

GPR x4

FPR x4

Program Counter x4

Control Registers x4

Decode Instruction Buffer

8

Page 10: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 9

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 11: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 10

Common Micro-Architecture features between SPARC64TM X+ and XII

Processor Core Fetch 8 instructions, Decode 4 instructions, Execute 6

instructions per instruction pipeline Aggressive O-O-O execution,

including memory access (load and store) High single thread performance SWoC (Software on Chip)

Accelerates specific software function by hardware Outside the Core Embedded Memory and IO Controller (PCIe GEN3) Robust RAS (Reliability, Availability, Serviceability) ★

Page 12: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

Reliability, Availability, Serviceability

→ Guarantees Data Integrity (Number of checkers increased to ~66,000)

Units Error detection and correction scheme

Cache (Tag) ECC Duplicate & Parity

Cache (Data) ECC Parity

Register ECC (INT/FP) Parity(Others)

Execution Unit Parity/Residue Cache dynamic degradation

Yes

HW Instruction Retry Yes History Yes

Green: 1bit error Correctable Yellow: 1bit error Detectable Gray: 1bit error harmless

11

SPARC64TM XII RAS diagram

Page 13: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

Hardware Instruction Retry

1. Error

PC

Fetch Execute Commit

PC

SW visible resources

SW visible resources

3. Single step execution

Instruction Retry

5. Back to normal execution after the re-executed Instruction gets committed without an error.

X

GPR,FPR

Memory

GPR,FPR

Memory

Fetch Execute Commit

4. Update of SW visible resources

2. Flush

CSE

RSE,RSF RSA

RSBR GUB,FUB

IWR IBF

EAGA/B EXA/B

FLA/B/C/D

CSE

RSE,RSF RSA

RSBR GUB,FUB

IWR IBF

ALU EAGA/B EXA/B FLA/B

When an error is detected, Hardware re-execute the instruction automatically to remove the transient error by itself.

12

Page 14: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 13

Enhanced Micro-Architecture from SPARC64TM X+

Processor Core Dual instruction pipelines sharing L1 I-cache, and

8-way SMT (4-way SMT per instruction pipeline) ★ Better Branch Prediction Scheme Various Queue-size increase Deeper pipeline to increase Frequency High Speed Mode

Outside the core 3 level hierarchy cache, and

4 LCU (Last level cache and Core Unit) structure ★ DDR4-2400 memory Doubled PCIe GEN3 ports (8 lane x 4)

Page 15: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 14

Micro-architecture enhancements 1/2 Increase core throughput, keep single thread performance Dual instruction pipelines per core

Each pipeline has its own resources except ↓

Sharing L1 I-Cache and TLB between the two pipelines Especially effective in DB apps

4-way SMT instruction pipeline More throughput compared with the previous 2-way SMT Resources are dynamically shared between the threads.

#Activethreads

L1 I$ IB Rsv. Station

Rename Registers

Execution Units

FP/SP Port

L1 D$ CSE

1 100% 50% 50% 50% 50% 50% 50% 50% 2 100% 50% x2 50% x2 50% x2 100% 50% x2 100% 50% x2 4 100% 25% x4 25% x4 25% x4 100% 25% x4 100% 25% x4 8 100% 12.5% x8 50% x2 12.5% x8 100% 12.5% x8 100% 12.5% x8

Core Resource allocation

Page 16: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 15

Micro-architecture enhancements 2/2 3 level cache and 4 LCU structure to realize Low Latency and High throughput L2 cache latency < L3 cache latency x 0.5 L2 cache to Core : 32 Bytes x 2 pipelines L3 cache to Core : 32 Bytes x 3 cores

MEZASI Fujitsu’s unique cache protocol for LCU

ELC (Extra Level Cache) Unused L3 cache blocks act as Victim Cache

IO-DCA (IO Direct Cache Access) Direct DMA write to L3 cache from IO device

Speculative memory access Memory access in parallel with other L3 cache blocks access

L2 cache

L1I L1D L1D

PIPE0 PIPE1

4 LCUs

MAC IOC CPU-CPU i/f

Inter L3 cache block Coherence Control

L3 cache block

3 cores

16B 32B

Page 17: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 16

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 18: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 17

Fujitsu SPARC M12 server Fujitsu’s latest UNIX server with SPARC64TM XII.

M12-2: mid-range server M12-2S: scalable high-end server

M12-2

M12-2S

Page 19: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

SPARC M12 System Architecture Scales from 1 to 32 CPU sockets (2,048→3,072 threads)

– Directory-based cache coherency – High-speed interconnect, up to 25 Gbps per lane in SPARC64TM XII

System Configuration

– Building Block (BB) is 2 CPUs and 2 XBs – Up to 4 BBs can be connected by XBs – 16 BBs can be connected via XB-Boxes – BB topology inherits current M10-4S

with SPARC64TM X+ Building Block (2 CPU Sockets)

CPU

CPU

XB

XB

25Gbps 14.5/14Gbps

16 BBs (32 CPU Sockets)

To other XBs ~168GBps (IN/OUT)

BB#00

BB#01

BB#02

BB#03

BB#08

BB#09

BB#10

BB#11

BB#04

BB#05

BB#06

BB#07

BB#12

BB#13

BB#14

BB#15

XB

XB

XB

XB

XB

XB

XB-Box#0

XB-Box#2

XB

XB

XB

XB

XB

XB

18

Page 20: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

SPARC M12 System Cooling VLLC (Vapor and Liquid Loop Cooling)

Newly developed cooling mechanism for SPARC M12. Evaporative cooling, taking heat away when liquid changes to vapor The pumps circulate coolant in the VLLC system The radiator dissipates the heat absorbed by the cooling plate into the air.

➔ Achieves 2x cooling performance of the current LLC

19

Pump

Radiator

Cooling Plate

CPU

Phase change of Liquid

Radiator

Pump

CPU

Liquid

Vapor + Liquid

Page 21: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 20

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 22: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 21

SPARC64TM XII benchmark numbers

1CPU 2CPU 16CPU SPECint®_rate2006 956 1,910 14,500 SPECjbb®2015 Composite

Max Critical

52,659 jOPS 34,771 jOPS

98,333 jOPS 63,354 jOPS

n/a n/a

SPECjbb®2015 Multiple-JVM

Max Critical

54,434 jOPS 34,771 jOPS

n/a n/a

n/a n/a

STREAMtriad 127GB/s n/a 1,533GB/s

Source except STREAMtriad : www.spec.org Fujitsu SPARC M12-2S: SPARC64 XII(4.25GHz, up to 4.35GHz) x 1,2,16CPU OS:Oracle Solaris 11.3 Compiler: Oracle Developer Studio 12.6 JVM:HotSpot™ 64-Bit Server VM, version 1.8.0_121

Page 23: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017

0%

50%

100%

150%

200%

250%

300%

350%

Integer Integer Java OLTP Integer Java OLTP STREAM

Single thread Core throughput Chip thrioughput Memory

SPARC64TM XII relative performance (SPARC64TM X+ = 100%)

22

SPARC64TM XII Performance 2x chip throughput and 2.5x core throughput

compared to previous SPARC64TM XII, keeping single thread performance

Keys: 8way SMT design, Higher CPU frequency, and micro-architecture improvement

1.8x-2.2x chip throughput

>1.0x single thread performance

2.3x-2.9x core throughput

Page 24: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 23

Agenda Fujitsu Processor Development History

SPARC64TM XII Design concept Chip overview Micro Architecture System Architecture Performance

Summary

Page 25: SPARC64™ XII: Fujitsu’s latest 12 Core Processor … 8 instructions, Decode 4 instructions, Execute 6 instructions per instruction pipeline Aggressive O-O-O execution, including

SPARC64™ XII All Rights Reserved, Copyright© FUJITSU LIMITED 2017 24

Summary SPARC64TM XII is Fujitsu’s 12th SPARC processor which has

been designed to be used for Fujitsu’s latest UNIX server.

The processor enhancements are to increase throughput, while keeping its high single thread performance and its robust RAS features

SPARC64TM XII measured results have shown 2x chip throughput of SPARC64TM X+

Fujitsu will continue to develop SPARC64TM series to meet the needs of a new era.