lum final presentation chanit giat rachel stahl instructor: artyom borzin

35
LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Post on 15-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

LUM final presentation

Chanit Giat

Rachel Stahl

Instructor: Artyom Borzin

Page 2: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

PROXY CACHE ENGINE

The proxy cache engine gives hardware support to a server’s OS in order to improve its service rate, and adds security features.

The main memory of a network server is the quick storage device, where the recently accessed data is saved. When a new request for data is received, the application must search the memory. If the data are found - send the response; otherwise the data must be read from a slower storage device (disk, tape) and then sent to the user.

Page 3: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

PROXY CACHE ENGINE

The system stores the information about all the files’ mapping in main memory and calculates the exact path to the required file if present in main memory. If not present, orders the operating system to bring it from the storage device, and supplies the path to the free memory space is supplied.

The system holds 2 main data bases: A main memory, which holds up to 2Meg paths to

the server’s memory, and their aging parameters. A bit map table, which allows faster memory

management by holding the free space image of the main memory.

Page 4: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Main functions:

Search – returns the path to the main memory, or a path to a free space in the memory.

Set attributes – sets the file’s aging attributes, as supplied by the OS.

Delete – deletes a certain path from the memory.

Count free – returns number of free path slots in the memory.

Init – initialize the machine.

(age – when number of records exceeds a specified number, the system cleans up some of them.)

LengthCID=1 ASIS Site# DataSEARCH:

Page 5: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Previous uArchitecture

Local Bus InterfaceReg.file

Data Streamcontroller

OutputFIFO

InputFIFO

DecoderCRCunit

DatabaseManager(DBM)

UTCAM

SRAM

(Bit

Map)

Page 6: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

uArchitecture changes:

Doubling the front-end of the machine, including: Input FIFO Decoder CRC unit

Buffering between the decoders and the DBM with a FIFO.

The search for a free index in the Bit Map is now done in parallel to the rest of the command execution.

Page 7: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Previous uArchitecture

Local Bus InterfaceReg.file

Data Streamcontroller

OutputFIFO

InputFIFO

DecoderCRCunit

DatabaseManager(DBM)

UTCAM

SRAM

(Bit

Map)

Page 8: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

FrontEnd

New uarchitecture

InputFIFO Decoder CRC

Page 9: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

New uarchitecture

FrontEnd1

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

Page 10: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Double

FrontEnd1

DBMFifo

FrontEnd0

New uarchitecture

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Page 11: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

LOCAL BUSINTERFACE

New uarchitecture

Double

FrontEnd1

DBMFifo

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Reg.file

Data StreamController

OutputFIFO

Page 12: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

New uarchitecture

LOCAL BUSINTERFACE

Double

FrontEnd1

DBMFifo

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Reg.file

Data StreamController

OutputFIFO

DBM

Page 13: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Data Flow

LOCAL BUSINTERFACE

Double

FrontEnd1

DBMFifo

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Reg.file

Data StreamController

OutputFIFO

DBM

Page 14: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Data Flow

LOCAL BUSINTERFACE

Double

FrontEnd1

DBMFifo

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Reg.file

Data StreamController

OutputFIFO

DBM

Page 15: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Data Flow

LOCAL BUSINTERFACE

Double

FrontEnd1

DBMFifo

FrontEnd0

InputFIFO Decoder CRC

InputFIFO Decoder CRC

FIFO

Reg.file

Data StreamController

OutputFIFO

DBM

Page 16: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Data stream ctrl

LOCAL BUSINTERFACE

InputFIFO 0

InputFIFO 1

Reg.file

Data StreamController

OutputFIFO

FIFO 0 FIFO 1

Sys_clr

!sot & lwr

!sot & lwr

SOT – start of transaction. lwr – specifies write/read from the system.

Page 17: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: Data Stream ctrl

Reading from

registerfile (crc)

Data enters FIFO 0

Data enters FIFO 1

Page 18: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

DBM FIFO

DBMFifo

FIFO

DBM

WAIT ONGO0

WAIT ONGO1

DEC0DEC1

go0 &

!dbm_full

go1 &

!dbm_full fifo_wrdone

fifo_wrdone

Sys_clrgo0/1:

FrontEnd0/1

(decoder0/1)

are ready

dbm_full:

dbm FIFO is

full.

fifo_wrdone:

Write to FIFO is done.

Page 19: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: DBM FIFO

State encoding:

1 – wait on go0

2 – DEC0

4 – wait on go1

8 – DEC1

WAIT ONGO0

WAIT ONGO1

DEC0DEC1

go0 &

!dbm_full

go1 &

!dbm_fullfifo_wrdon

e

fifo_wrdone

Sys_clr DBM FIFO

samples data from decoder 0

DBM FIFO samples data from decoder 1

Page 20: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

DBMDOUBLE

DBM interface

DBM fifo ISSUE LOGIC

EXECUTIONUNIT

REQpacket

PACKER

BIT MAP UNIT

Saves the last badDecoder status,

Which goes to theOutput FIFO with the

Next successful Command

Page 21: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: bad decoder status

Page 22: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Register file

Previously, the user could read the system’s current parameters from the register file: command id, CRC value, file’s site etc.

Since we have 2 pipes, the register file had to be changed: Some registers contain data from both pipes. For others, there is a need to specify the pipe of

which to read the parameters.

Page 23: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

ADD - old

IDLE

FND_NINDX

ADD_NINDX

ACK_NINDX

(ad_en)&&

(!ad_done)

(bm_s4f_done)

(ad _

erro

r)

(ad_new_done)

(bm_s4f_done)

Finding a newFree index

~40 clk cycles

UpdatingBit map

~10 clk cycles

IDLE

FNEW ACKN

!Sys_clr

Fnew_done

Ackn_done

Bm_s4f_new_ack

!Bm_s4f_new_ack

s4f - old

Page 24: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

ADD - new

IDLE

ADD_NINDX ACK_NINDX

!Sys_clr

(ad_en)&&(!ad_done)&&

(bm_index_valid)

(ad_err)

(ad_new_done)

(bm_ack_rcvd)

New index is found while the‘ADD’ module is idle !(which is for more than

50 cycles…)

WT_FOR_ACK

FNEW ACKN

!Sys_clr

Bm_index_valid

Ackn_done

Add_ack

s4f - new

Page 25: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: add, s4f

s4f state encoding:

0 – wait for ack

2 – ack old index

1 – find new index

add state encoding:

1 – idle

2 – add index

4 – ack index

Page 26: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: add, s4f

s4f state encoding:

0 – wait for ack

2 – ack old index

1 – find new index

add state encoding:

1 – idle

2 – add index

4 – ack index

Page 27: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: add, s4f

s4f state encoding:

0 – wait for ack

2 – ack old index

1 – find new index

add state encoding:

1 – idle

2 – add index

4 – ack index

Page 28: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: add, s4f

s4f state encoding:

0 – wait for ack

2 – ack old index

1 – find new index

add state encoding:

1 – idle

2 – add index

4 – ack index

Page 29: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

performance

Main function is the ‘search’ command:

Long path (up to 512 bytes) => long CRC calculation => long decoding stage.

Access to main memory => if failed to find the path requested, adding a new record to the memory, which includes finding a new index and acknowledge of the record added (at least 4 memory accesses).

Page 30: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

performance

2 input FIFOs – double rate receiving data from OS.

2 decoders – allows decoding of 2 commands in parallel. Significant for several long ‘search’ commands in a row.

DBM FIFO – separates between the decoding and execution of commands, enables them to perform in parallel.

Page 31: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

performance

2 search commands each with 102 bytes of path (on which crc is working):

  Old Architecture New Architecture

Ads_n falls (first search)

628n 628n

First dword in Input fifo (is_usedw)

719n 718n

End of decoding (crc_done)6128n

(dbm_fifo->fifo_input is ready) 6202n

Pck__en raises 9344n 8560n

Sot falls 9468 2318n

First dword in Input fifo (is_usedw)

2380n 2380n

End of decoding (crc_done)14574n 7869

Pck__en raises 15408n 9486

79428625

6064 926

Page 32: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

performance

Search for a free index now executes in parallel to other execution stages of a command. Saves ~50 clock cycles per ‘search’ command, which usually takes ~400-1000 cycles.

Page 33: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

The end…

Page 34: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: s4f

State encoding:

0 – wait for ack

2 – ack old index

1 – find new index

WT_FOR_ACK

FNEW ACKN

!Sys_clr

Bm_index_valid

Ackn_done

Add_ack

Page 35: LUM final presentation Chanit Giat Rachel Stahl Instructor: Artyom Borzin

Sim: s4f

State encoding:

0 – wait for ack

2 – ack old index

1 – find new index

WT_FOR_ACK

FNEW ACKN

!Sys_clr

Bm_index_valid

Ackn_done

Add_ack