application-specific signatures for transactional memory in soft processors martin labrecque mark...
TRANSCRIPT
![Page 1: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/1.jpg)
Application-Specific Signatures for Transactional Memory in Soft Processors
Martin LabrecqueMark Jeffrey
Gregory Steffan
ECE Dept. University of Toronto
![Page 2: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/2.jpg)
2
FPGA
Increasingly large Systems-on-Chip Many CPUs, accelerators, IP blocksProcessors are easier to program than hardware
FPGAs & multicores: similar parallel programming challenge
Soft Processor
PC
Instr. Mem.
Reg. Array
regA
regB
regW
datW
datA
datB
ALU
25:21
20:16
+4
Data Mem.
datIn
addrdatOut
aluA
aluB
IncrPC
Instr
4:0 Wdest
Wdata
20:13
Xtnd
25:21
Wdata
Wdest
15:0
Xtnd << 2
Zero Test
25:21
Wdata
Wdest
20:0
25:21
Wdata
Wdest
FPGAs for Systems-on-Chip
DDR controller
Ethernet MACcontrollers
Why are parallel programs challenging?
![Page 3: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/3.jpg)
3
Packet Processing Example
packet = get_packet();
…
connection = database->lookup(packet);
if(connection == NULL)
connection = database->add(packet);
connection->count++;
…
global_packet_count++;
SINGLE-THREADED MULTI-THREADED
1- Must correctly delimit atomic operations2- Improve performance by finer-grain locking
Challenges:
Ato
mi
cA
tom
i c
packet = get_packet();
…
connection = database->lookup(packet);
if(connection == NULL)
connection = database->add(packet);
connection->count++;
…
global_packet_count++;
![Page 4: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/4.jpg)
4
Packet Processing Example
Ato
mi
cA
tom
i c
packet = get_packet();
…
connection = database->lookup(packet);
if(connection == NULL)
connection = database->add(packet);
connection->count++;
…
global_packet_count++;No Parallelism
Optimisic Parallelism across Connections
Opportunity for ParallelismMULTI-THREADED
![Page 5: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/5.jpg)
5
Exploit Opportunity for Parallelism
• Allow more than 1 thread in a critical section
• Will succeed if threads access different data
Transactional Memory–the new hot topic for multiprocessor computers–how to map TM to FPGAs?
![Page 6: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/6.jpg)
6
Our Transactional Approach
• Modify main memory directly: reduce copies, faster commit
DataCache
Data
processor1
Off-chip DDR
processor2
x x
•Detect conflicts prior to corrupting main memory
• Undo changes on transaction abort
• How to efficiently detect conflicts?
![Page 7: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/7.jpg)
7
Conflict Detection
Must detect all conflicts for correctnessReporting false conflicts is acceptable
Transaction1 Transaction2
Read A Read A OK
Read B Write B CONFLICT
Write C Read C CONFLICT
• Compare accesses across transactions:
Write D Write D CONFLICT
• Tracking speculative reads and writes
![Page 8: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/8.jpg)
8
Related Work on Conflict Detection
• FPGAs: test speculative bits in the cache–Complex to evict cache lines
–Lots of additional state
–Too restrictive in terms of storage capacity
Signatures well suited to FPGA bitwise operations
How can signatures be efficiently implemented?
• ASIC: compare signatures–Signature: bit vector recording TM memory accesses
–No previous signature FPGA implementation
![Page 9: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/9.jpg)
9
Conflict Detection with Signatures
• Hash of an address indexes into a bit vector
- More bits per signature more resolution - FPGA timing and area limit the number of bits- Hash functions have varying complexity/accuracy
processor1 load
HashFunction
Write Read
Signatures
processor2 store
AND
![Page 10: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/10.jpg)
10
Goals of this Work
• Implement efficient signatures for TM on FPGAs
FPGA reconfigurability better/more-efficient TM
Evaluate with real system
![Page 11: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/11.jpg)
11
Existing Hash Functions
1. Bit Selection
Address bits0 1 1 0 ... ...
Hash = 0 1 1 0
4 bits hash index into 16 signature bits
![Page 12: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/12.jpg)
12
Existing Hash Functions (continued)
We use 4 hash functions to improve performance/length
2. H3: XOR random address bitsAddress bits1 0 0 1 1 1 ...
Multiple hash functions index different parts of the signature
Address bits0 0 1 1 0 1 ...
Hash_2 = 1 0
Hash_1 = 1 1
![Page 13: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/13.jpg)
13
Existing Hash Functions (continued)
3. PBX: XOR high-order bits with low-order onesAddress bits1 1 0 1 ...
Hash_2 = 0 1
Address bits1 1 0 1 ...
Hash_1 = 0 1
Address bits0 0 1 0 ...
Hash_2 = 1 0
4.LE-PBX: XOR high-order bits with low-order ones, progressively omit low-order bits in hash functions
![Page 14: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/14.jpg)
14
Signatures: an Opportunity for FPGAs
Application-specific signatures!
ASIC hash functions on FPGA: very area consuming Due to locality:
applications access certain memory locations more frequently
certain locations will have more conflicts than others
Via app-specific signatures: increase tracking resolution of conflicting memory locations
decrease tracking resolution of others
FPGAs allow customized hash function for each application
![Page 15: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/15.jpg)
15
Trie-based Hashing for Signatures
0 0 00 1 11 0 01 0 11 1 01 1 1
Binary Addresses (profiling)
1xx
root
11x
111 110 101 100 011 000
10x
0xx
01x 00x
Trie gives control on the resolution for different memory regions
Complete trie of all TM accesses is HUGE
Which leaves in the trie can/cannot be merged?
Leaves are distinctaddresses
signature bits
![Page 16: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/16.jpg)
16
Load/Store A2 A1 A0
Trie-Based Conflict Detection
1xx
xxx
11x
111 110 101 100 011 000
10x
0xx
01x 00x
Simulation feedback:
3 leaves in trie 3 signature bits encompass all accesses
Compact trie by only evaluating nodes with remaining branching
Representation is very efficient!
A2 & A0
A2 & !A0
!A2
A2,A1,A0A2,A1,A0
![Page 17: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/17.jpg)
17
Trie-based Hash functionEvaluation
Training packet trace is different from test packet trace
![Page 18: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/18.jpg)
18
Multiprocessor System– NetFPGA: Virtex II Pro 50, 4 GigE + 1 PCI interfaces– 2 processors @ 125 MHz (limited by FPGA)– 64 MB DDR2 SDRAM @ 200 MHz
Real system executing real applications
Instr.
Data
Input mem.
Output mem.
I$
processor1
1-thread I$
processor2
1-thread
InputBuffer
Shared DataCache
OutputBufferpacket
inputpacketoutput
Off-chip DDR
Synch. Unit
![Page 19: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/19.jpg)
19
Simulated Ratio of False Conflicts versus Number of Signature Bits
- Trie-based hashing function requires much fewer signature bits
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000 100000
BitSel
H3
LE-PBX
Trie
NA
T, p
erce
nt fa
lse
conf
licts
![Page 20: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/20.jpg)
20
Simulated Ratio of False Conflicts versus Number of Signature Bits
0
5
10
15
20
25
30
1 10 100 1000 10000 100000
BitSel
H3
LE-PBX
Trie
Classifier
UDHCP
0
5
10
15
20
25
1 10 100 1000 10000 100000
BitSel
H3
LE-PBX
Trie
- Trie-based hashing function requires much fewer signature bits
0
5
10
15
20
25
30
35
40
45
1 10 100 1000 10000 100000
BitSel
H3
LE-PBX
Trie
NAT
0
2
4
6
8
10
12
1 10 100 1000 10000 100000
BitSel
H3
LE-PBX
Trie
Intruder
![Page 21: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/21.jpg)
21
0.5
0.6
0.7
0.8
0.9
1
1.1
0 50 100 150 200
Classifier
UDHCP
Intruder
NAT
Simulated Packet Rate Normalized to Ideal Conflict Detection vs Trie-Based Signature Length
Signatures are Critical to Performance
Ideal
![Page 22: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/22.jpg)
22
2 Best Implementation Options
Block RAM
2048 signature bits per thread
Signatures
Bit-Select hash function
Registers~100 signature bits per thread
Arbitrary hash function
We use trie-based signatures:They perform best at that size
Let’s Compare!
Maximum Design @ 125MHz
![Page 23: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/23.jpg)
23
Trie-based Hashing Normalized to BitSelection
- Significantly fewer rollbacks packet rate increase
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Classifier NAT UDHCP Intruder
Throughput
Area+12%
+58%
+9%
+71%
- At most 5% area overhead
![Page 24: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/24.jpg)
24
Conclusions Conflict detection significantly impacts performance
Trie-based hashing reduces required signature bits
Trie-based hashing can be implemented in LUTs Preserve frequency, 5% area overhead
Retiming is required to implement in RAMs
Increased performance (up to 71%) versus other best implementation (RAM-based bit-select)
- Application-specific signatures enable first fully integrated TM processor for FPGA
- We now have an extended version working with 8 threads
![Page 25: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/25.jpg)
25
Martin LabrecqueMark Jeffrey
Gregory Steffan
ECE Dept. University of Toronto
martinL/[email protected]
Thank you!
![Page 26: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/26.jpg)
26
![Page 27: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/27.jpg)
27
Transactional MemoryParallel Programming Made Easy
•Reduce conservative synchronization overhead
Lock(); if (shared_1) array [ i ] = 0; Unlock();
Only serialized when truly necessary
Bool val = f(shared_1);if(val){ Lock(); if ( f(shared_1) ) shared_1 = 0; Unlock();}
Lock(); if ( f(shared_1) ) shared_1 = 0; Unlock();
BE
FO
RE
AF
TE
R
•Alleviate need for fine grained-synchronization
![Page 28: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/28.jpg)
28
Our Transactional Approach • No program change required• Modify directly main memory
DataCache
Data
processor
Off-chip DDR
processor
x
x
x
•Detect conflicts prior to corrupting main memory• Undo changes on transaction abort
![Page 29: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/29.jpg)
29
sigsvn_udhcp/statsout fp ratessigsvn_other/mat other stats
![Page 30: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/30.jpg)
30
Transactional MemoryParallel Programming Made Easy
•Reduce conservative synchronization overhead
Lock(); if (shared_1) array [ i ] = 0; Unlock();
Only serialized when truly necessary
Bool val = f(shared_1);if(val){ Lock(); if ( f(shared_1) ) shared_1 = 0; Unlock();}
Lock(); if ( f(shared_1) ) shared_1 = 0; Unlock();
BE
FO
RE
AF
TE
R
•Alleviate need for fine grained-synchronization
![Page 31: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/31.jpg)
31
Transactional Single-Threaded Processor (simplified)
Instr.Cache
PC
+4
Reg.Array
ALU
DataCache
Hazard Detection Logic
Hazard detection is too slow: use static hazard detection
![Page 32: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/32.jpg)
32
Transactional Single-Threaded Processor (simplified)
Instr.Cache
+4
ALU
DataCache
Con
flict
Det
ectio
n
Undo Log
Reg.ArrayReg.Array
PCPC
![Page 33: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/33.jpg)
33
Transactional Packet Processing
• Hardware support to revert speculative changes to:– Register file– Program counter – Data memory
• To detect failed speculation:– Record read and write sets of speculative threads– Compare sets across threads
When does the set comparison take place?
![Page 34: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/34.jpg)
34
Conflict Detection with Signatures• Suited for FPGA bitwise operations
– Hash of an address sets bits in a bit vector
-Requires many bits per thread-Timing constraints allow read and write set tracking for 2 threads-Made a single-threaded 2-processor implementation
W 00000000R 00000000
Signature Thread 0
processor x
W 01000000R 00000000
W 00000000R 00000000
Signature Thread 1
processor x
W 01000000R 00000000
– Set comparison is an AND operation– Clearing sets is done in 1 cycle
![Page 35: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/35.jpg)
35
1xx
root
11x
111 110 000
0xx
00x
![Page 36: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/36.jpg)
36
![Page 37: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/37.jpg)
37
A New Meaning for Locks• Optimistically consider locks
• No program change required
Lock();
if ( f( ) )
shared_1 = a();
else
shared_2 = b();
Unlock();
Thread1 Thread2 Thread3 Thread4
LOC
KS
Thread1 Thread2 Thread3 Thread4
TR
AN
SA
CT
IOA
L x
• Reduce conservative synchronization overhead• Reduce challenge of fine grained-synchronization
![Page 38: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/38.jpg)
38
![Page 39: Application-Specific Signatures for Transactional Memory in Soft Processors Martin Labrecque Mark Jeffrey Gregory Steffan ECE Dept. University of Toronto](https://reader035.vdocuments.site/reader035/viewer/2022062409/5697bf921a28abf838c8f085/html5/thumbnails/39.jpg)
39
• * can you list the apps?
• emphasize that train != test in methodology page