goldstriketm 1: cointerra’s first generation crypto-currency processor for bitcoin mining machines

16
GOLDSTRIKE TM 1: COINTERRA’S FIRST GENERATION CRYPTO-CURRENCY PROCESSOR FOR BITCOIN MINING MACHINES Javed Barkatullah, Ph.D., MBA Timo Hanke, Ph.D. Ravi Iyengar Ricky Lewelling Jim O’Connor

Upload: javed-barkatullah

Post on 10-Aug-2015

79 views

Category:

Devices & Hardware


2 download

TRANSCRIPT

GOLDSTRIKETM 1:

COINTERRA’S FIRST GENERATION

CRYPTO-CURRENCY PROCESSOR

FOR BITCOIN MINING MACHINES

Javed Barkatullah, Ph.D., MBA

Timo Hanke, Ph.D.

Ravi Iyengar

Ricky Lewelling

Jim O’Connor

BITCOIN MINING WORK

For a message m find a nonce n such

that the 256-bit result

Sha-256 (Sha-256 (m || n) )

has a specified number of leading

zero bits (“target”)

• Sha-256 is a cryptographic hash function

• Cryptographic hash functions are “one way” functions

• This search problem is best solved by trial-and-error

VersionPrevious

HashMerkle

RootTarget Time Nonce Hash

DoubleSha

Tx_id_0 Tx_id_1 Tx_id_n Tx_id_n+1

Tx_id_0,1 Tx_id_n,n+1

Merkle

Root

VersionPrevious

HashMerkle

RootTarget Time Nonce

MessageHash

DoubleSha

Entry m

Entry m+1

Transaction 0 Transaction 1

DoubleSha

DoubleSha

Block Header

HISTORY OF BITCOIN MINING HARDWARE

GPU based

Platform

FPGA based

Platform

Graph source: http://bitcoin.sipa.be

Custom ASIC

based Platform

First Bitcoin

Network & CPU

based Platform

↑Hashing Capacity

↑Network Difficulty

↑Platform Hashing Power

Network Difficulty

adjusted every 2016

blocks mined

1e-06

1e-04

1e-02

1

100

1e04

TH

ash

/s

1e-06

1e-04

1e-02

1

100

1e04

Dif

ficu

lty

x 1

06

GOLDSTRIKE™ 1 DEVELOPMENT TIMELINE

• 4 months from RTL start to tape out!

• 49 days from tape out to first silicon

• Packaged silicon arrived on Dec. 28, 2013

• First system shipped to customer around mid January, 2014

Cointerra, Inc. Founded

May2013

June2013

July2013

August2013

Sept.2013

Oct.2013

Nov.2013

Dec2013

Jan.2014

RTL Start

RTL Freeze

Tape Out

First Silicon

System Shipped!

Board & System Design

Start

Feb.2014

• Motorola compatible 4-pin SPI Port

• PLL with simple bit-bang interface

• 120 Hash Engines arranged into 16 super-pipes

• 128 deep Input Work FIFO

• 128 deep Output Status FIFO

• 384-bit Pipe Control Register (PCR) to enable/disable individual hash engine

• Low I/O bandwidth requirement

• New work (384 bits) every 232 clock cycles per engine

GOLDSTRIKE™ 1 ARCHITECTURE

• Two rounds of SHA-256 processing

• Searches for a result in 232 nonce range

• Each round consists of 64 iterations

• Fully unrolled iterations

• Two parallel but connected pipelines – message & compressor

• Generates a result out only if target criteria met

HASH ENGINE

Input Interface

Output Interface

Hash_out

Work_in

Nonce Counter

Comparator

Stage 0Stage 1Stage 2

Stage 63Stage 62C

om

pre

sso

rR

ou

nd

1

Stage 0Stage 1Stage 2

Stage 63Stage 62

Me

ssa

ge

Ro

un

d 2

Stage 0Stage 1Stage 2

Stage 63Stage 62 M

ess

ag

eR

ou

nd

1

Stage 0Stage 1Stage 2

Stage 63Stage 62

Co

mp

ress

or

Ro

un

d 2

COMPRESSOR STAGE OF SHA-256 PIPELINE

S0 Maj S1 Ch

+ +

+

+ +

A DB C E F HG

Ain Bin Cin Din Ein Fin Gin Hin Kin Win

Ain Bin Cin Ein Fin Gin

Aout Bout Cout Dout Eout Fout Gout Hout

S0(A) = ( A >>> s1) ( A >>> s2 )

( A >>> s3)

S1(E) = ( E >>> s4) ( E >>> s5 )

( E >>> s6)

Ch(E,F,G) = (E F) (~E G)

Maj(A,B,C) = (A B) (B C)

(C A)

All registers are 32-bits wide

MESSAGE STAGE SHA-256 PIPELINE

• 512 bits message word divided in to 16 words, 32-bit wide (W0 to W15)

• s0(W1,in) = (W1,in >>> s7) (W1,in >>> s8 ) (W1,in >>> s9)

• s1(W14,in) = (W14,in >>> s10) (W14,in >>> s11 ) (W14,in >>> s12)

+

W15

,in

W14

,in

W13

,in

W12

,in

W11

,in

W10

,in

W9

,in

W8

,in

W7,

in

W6

,in

W5,

in

W4

,in

W3,

in

W2

,in

W1,

in

W0

,in

W13 W12 W11 W10 W9 W8 W7 W6 W5 W4 W3 W2 W1 W0W15 W14

s0s1Win

To Compressor

• Global Foundries HKMG

28nm HPP process

• 9 metal layers

• 120 hash engines in

11x11 array (grey boxes)

• Top level logic block in

the center

DIE MICROGRAPH

• 37.5 x 37.5 mm FCBGA

package

• 4 bare dies per package

• 1296 pins

• > 500 GH/s @ 1.05GHz & 0.7v

GOLDSTRIKE™-1 (GS1) PACKAGE

HEAT DISSIPATION CHALLENGE

• Cooling options examined:

• Heat sink + Airflow Common

in CPU applications

• Liquid Cooling Popular

among over-clockers

• Immersion Efficient for data

centers

• Liquid cooling with direct

attach cooling head selected

• Enable a common platform for

both home & data center

customers

Air Temps on Plane 3mm

above PCB Top

Up to 2TH/s hash rate per appliance • Dual PCB with 4 GS1 packages total

• Power budget to meet household outlet capacity

Layout - 4U chassis Design

Driven by cooling requirements Radiator cross-section

Fan Size

Similar design for TerraMiner IV data center and home models

Push pull airflow design for maximum performance

Fans chosen for balance between cost, performance & audible noise

Dual 1U power supplies for minimal volume impact

TERRAMINER APPLIANCE

TERRAMINER APPLIANCE IMAGES

Front View

Back View

TERRAMINERTM IN DATACENTER

ASIC DESIGN CHALLENGES & CHOICES

Challenges:

• High power density and high node toggle rates

• Power delivery

• Heat dissipation

• IR drop and di/dt noise

• Very high sequential cell count

• Reduce die area and power consumption

• Very short (4-month) schedule from RTL start to tape out

• Very small design team

Choices:

• Optimize common core blocks

• Maximize design repeat & reuse

• Utilize highly experienced design team

CONCLUDING REMARKS

• Continued demand for higher performance and lower power

appliance

• Maintain Cointerra’s leadership position in Bitcoin mining

industry

• New designs with increased power efficiency and

performance