overview on hardware optimizations for database engines

Post on 03-Apr-2022

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Overview on Hardware Optimizations for Database EnginesAnnett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner

BTW 2017, Stuttgart, Germany, 2017-03-09

2

Interaction DB-Engine and Hardware

Applications/Database Engines

Modern Hardware

Well-Known Challenge:Exploit hardware technology by specific data management techniques (indexing, data storage, query & transaction processing)

1970 1980 1990 2000 2010 2020

10

100

1000

10000

1e+05

1e+06

1e+07 memory (KByte)

1970 1980 1990 2000 2010 2020

0

2

4

6

8

10 #cores

Main Memory CPU

3

Era of Dark Silicon

 MOORE‘S LAW

§ Number of transistors in a dense integrated circuit doubles approximately every two years.

 DARK SILICON

§ We can no longer power the transistors that Moore is giving us

1970 1980 1990 2000 2010 2020

110

1001000

100001e+051e+061e+07 #transistors (x1000)

process (nm)

http://engineering.nyu.edu/garg/node/31

4

HW/SW Co-Design for DB-Engines

Applications/Database Engines

Modern Hardware

Challenge:HW/SW Co-Design for Database EnginesSpecialization of Hardware to overcome Dark Silicon

5

Outline

 HARDWARE FOUNDATION

 EXTENSIONS FOR PROCESSING ELEMENTS

 INTELLIGENT DMA CONTROLLER

6

Hardware Foundation

 TOMAHAWK PLATFORM

7

Hardware Foundation – Zoom In

8

Hardware Foundation – Zoom In (2)

 CORE MANAGER (CM)

§ Extended Xtensa-LX5 from Tensilica (now Cadence)

§ 32KB for code§ 64KB for data

 PROCESSING ELEMENTS (PE)

§ Xtensa-LX5 from Tensilica (now Cadence)§ 32KB for code

§ 2x32KB for data on PE

 APPLICATION CORE (APP)

§ 570T core from Tensilica (now Cadence)

Control-Plane

Control-Plane

9

Outline

Control-Plane

Control-Plane

 PART I:

EXTENSIONS OF PROCESSING ELEMENTS

10

Development Flow

DEVELOPMENT OF INSTRUCTION SET EXTENSIONS WITH

TENSILICA TOOLS

§ Tensilica Instruction Extension (TIE) language

§ C/TIE compiler§ Cycle accurate simulator/debugger

§ Processor generator

SYNTHESIS OF RTL CODE

§ Synopsys Design Compiler, PrimeTime PX

§ TSMC CMOS LP 65nm libraries

int res= (v0 + v1 + v2) >> shift8;

// shift8 -> internal stateint res=add3_shift(v0, v1, v2);

11

Investigated Database Primitives

Bitmap Compression and Processing (AND,

OR, XOR)

Hashing Sorted Set Operations

WA

H

PLW

AH

CO

MP

AX

Has

h+

Lo

oku

p

Has

h +

Inse

rt

Has

h K

eys

Has

h S

amp

ling

Cit

yHas

h3

2

Me

rge

So

rt

Inte

rse

ctio

n

Un

ion

Dif

fere

nce

Sort

-Me

rge

Jo

in

Sort

-Me

rge

A

gg

reg

atio

n (

SUM

)

Primivites

2014

12

Basic RISC Instruction Set

Application-Specific Instruction Set

Instruction Set

Application-Specific States

Application-Specific Registers

Basic Registers

Register Files

Instructionfetch

Load-Store Unit 0

Load-Store Unit 1

Data Prefetcher

Inte

rcon

ne

ct

Local InstructionMemory

Local Data Memory 0

Local Data Memory 1

Extended Tensilica LX5 Processor

64 bit

128 bit

128 bit

General Approach for all Extensions

13

Bitmap Primitives

 BITMAPS ARE A SPECIAL KIND OF INDEX  BITMAPS COMPRESSION

§ bit length equals number of tuples

 WORD-ALIGNED HYBRID (WAH) CODE

§ Stateless compression§ Run-length-encoding (RLE)

- run of 0‘s and 1‘s

§ WAH bitmaps contain RLE- compressed fills and

- uncompressed literals

Compressing Bitmap Indexes for Faster Search Operations

Kesheng Wu, Ekow J. Otoo and Arie ShoshaniLawrence Berkeley National Laboratory

Berkeley, CA 94720, USAEmail: {kwu, ejotoo, ashoshani}@lbl.gov

Abstract

In this paper, we study the effects of compression onbitmap indexes. The main operations on the bitmaps dur-ing query processing are bitwise logical operations such asAND,OR,NOT, etc.Using the general purpose compres-sion schemes, such as gzip, the logical operations on thecompressed bitmaps are much slower than on the uncom-pressed bitmaps. Specialized compression schemes, likethe byte-aligned bitmap code (BBC), are usually fasterin performing logical operations than the general purposeschemes, but in many cases they are still orders of magni-tude slower than the uncompressed scheme. To make thecompressed bitmap indexes operate more efficiently, wedesigned a CPU-friendly scheme which we refer to as theword-aligned hybrid code (WAH). Tests on both syntheticand real application data show that the new scheme sig-nificantly outperforms well-known compression schemesat a modest increase in storage space. Compared to BBC,a scheme well-known for its operational efficiency, WAHperforms logical operations about 12 times faster and usesonly 60% more space. Compared to the uncompressedscheme, in most test cases WAH is faster while still usingless space. We further verified with additional tests thatthe improvement in logical operation speed translates tosimilar improvement in query processing speed.

1. Introduction

This research was originally motivated by the needto manage the volume of data produce by a high-energy experiment called STAR1 [25, 26]. In this ex-periment, information about each potentially interest-ing collision event is recorded and multi-terabyte (1012)of data is generated each year. One important way ofaccessing the data is to have the data management

1 Information about the project is also available athttp://www.star.bnl.gov/STAR.

bitmap indexOID X =0 =1 =2 =3

1 0 1 0 0 02 1 0 1 0 03 3 0 0 0 14 2 0 0 1 05 3 0 0 0 16 3 0 0 0 17 1 0 1 0 08 3 0 0 0 1

b1 b2 b3 b4

Figure 1. A sample bitmap index.

system retrieve the events satisfying some conditionsuch as “Energy > 15 GeV and 7 <= NumParticles

< 13” [5, 25]. The physicists have identified about 500attributes that are useful for this selection process anda typical condition may involve a handful of attributes.This type of queries are known as the partial rangequeries. Since the attributes are usually read not mod-ified, the characteristics of the dataset are very sim-ilar to those of commercial data warehouses. In datawarehouse applications, one of the best known index-ing strategies for processing the partial range queries isthe bitmap index [6, 8, 21, 30]. For this reason, we haveselected to use the bitmap index for the data manage-ment software [25].

Generally, a bitmap index consists of a set ofbitmaps and queries can be answered using bit-wise logical operations on the bitmaps. Figure 1 showsa set of such bitmaps for the attribute X of a tiny ta-ble (T) consisting of only eight tuples (rows). The at-tribute X can have one of four values, 0, 1, 2 and3. There are four bitmaps each corresponding toone of the four choices. For convenience, we have la-beled the four bit sequences b1, . . . , b4. To process thequery “select * from T where X < 2,” one per-forms the bitwise logical operation b1 OR b2. Since

LBNL-49627

select * from T where X < 2

Table T

Bit-wiseOR

14

Bit-Wise OR on Compressed Bitmaps

40000380 00000000 00000000 001FFFFFb1

40000380 8000002 001FFFFF

Literal 0 fill Literal

7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000b2

WAHb1

C0000002 7C0001E0 3FE00000

1 fill Literal Literal

WAHb2

Bit-wise OR

32 bit wordsIn hex

OR OR OR OR

Logical operations (AND, OR, XOR) on two compressed bitmaps

1) Load WAH word(s)2) Calculate output (Fill-Fill,

Literal-Fill, Literal-Literal)3) Combine output

10<runlength>

11<runlength>

...

...

7FFFFFFF

00000000

15

C-Code

 WHILE(XIDX!=XSIZE && YIDX!=YSIZE) {

//new X or Y? Calculate new fill count …

if(XisFill==1 && YisFill==1) { //2 fills

if(XfillWords<YfillWords)

min=XfillWords;

else

min=YfillWords;

writeFill(comprResultBI,&Zidx,X[Xidx]|Y[Yidx],min);

XfillWords-=min;

YfillWords-=min;

}

else if((XisFill==1 && YisFill==0) || (XisFill==0 && YisFill==1)) {

if(XisFill==1){

XfillWords--;

if((X[Xidx]&0xC0000000)==0xC0000000) writeFill(comprResultBI, &Zidx, 0xC0000000, 1);

else { comprResultBI[Zidx]=Y[Yidx]; Zidx++; }

}

if(YisFill==1){

YfillWords--;

if((Y[Yidx]&0xC0000000)==0xC0000000)

writeFill(comprResultBI, &Zidx, 0xC0000000, 1);

else {comprResultBI[Zidx]=X[Xidx]; Zidx++; }

}

}

else {

result=X[Xidx]|Y[Yidx];

if((result&0x7FFFFFFF)==0x7FFFFFFF) writeFill(comprResultBI, &Zidx, 0xC0000000, 1);

else if((result&0x7FFFFFFF)==0) writeFill(comprResultBI, &Zidx, 0x80000000, 1);

else { comprResultBI[Zidx]=X[Xidx]|Y[Yidx]; Zidx++; }

}

}

Fill-Fill

Literal-Fill

Literal-Literal

16

Processing with PE Extension

Application specific states Preprocessing

Operation

Postprocessing

Application specific states

Initial Load Load Prepare Store Store

Memory 0

Memory 1

Memory 0

Memory 1

0000000F

00000003

40000380

80000002

001FFFFF

C0000002

7C0001E0

3FE00000

MEMORY

0

MEMORY

1

10000000..11000001..00101010..0111011.. 11000000..00101010..11000001..00110111..

10000000..11000001..00101010..01110111..

Is word fill or Literal? -> fill -> overwrite input words

11111111..11111111..11111111..11111111..

00000000..00000000..00000000..

11000000..00101010..11000001..0011011..00000000.. v 11111111... => 111111..

Write to output stream-> append or overwrite previous wordwith increased fill counter

00000000.0000000..00000..110011010..

Buffer result

11001110..

00000000..

00000000..

00000000..

MEMORY

0/1

Proceedtonext word(4x)

Align to 128-bit lines

Perform operation OR

ldXstream()

ldYstream()

4 x WAHinst()

17

Bit-Wise OR on Compressed Bitmaps

40000380 00000000 00000000 001FFFFFb1

40000380 8000002 001FFFFF

Literal 0 fill Literal

7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000b2

WAHb1

C0000002 7C0001E0 3FE00000

1 fill Literal Literal

WAHb2

Bit-wise OR

32 bit wordsIn hex

OR OR OR OR

Code with Extension

do{ ldXstream();ldYstream();WAHinst(); WAHinst(); WAHinst();

} while(WAHinst());

18

Many More Extensions

Bitmap Compression andProcessing (AND,

OR, XOR)

Hashing Sorted Set OperationsW

AH

PLW

AH

CO

MP

AX

Has

h+

Lo

oku

p

Has

h +

Inse

rt

Has

h K

eys

Has

h S

amp

ling

Cit

yHas

h3

2

Me

rge

Sort

Inte

rse

ctio

n

Un

ion

Dif

fere

nce

Sort

-Me

rge

Join

Sort

-Me

rge

Ag

gre

gat

ion

(SU

M)

BitiX X X X

HASHI X X X X X

Titan3D X X X X X X X

Tomahawk DBA

X X X X X X

Processor

Extension

19

Evaluation

 REFERENCE PROCESSORS

§ Tomahawk DBA Processor --> Set of different DB-Extensions for WAH-Compression, Hashing, andSortes-Set Operations

Processor DescriptionTechnology

[nm]Atotal [mm²] fMAX [GHz]

PMAX [W] @ fMAX

Tomahawkwithout DBA

Basic Xtensa LX5 without instruction set extensions, 1 LSU, 32-bit memory interface

28 15.92 0.555 0.7

Tomahawk with DBA

Set of different DB-Extensions for WAH-Compression, Hashing and Sorted-Set Operations

28 18 0.5 0.753

Intel i7-6500ULow-power Intel 2-core processor based on Skylake architecture, 4MB L3 cache

14 99* 3.1 25

Comparison

20

Evaluation - Bitmaps

21

Outline

 PART 2:

 INTELLIGENT DMA CONTROLLER

22

Problem Statement

22

NoC

T2RISCCore

T2RISCCore

T2RISCCore

APP CM

Memory

MicronDDR2SDRAM

LocalMemoryLocalMemoryLocalMemory

MemoryControllerSynopsysDWC

DDR2

APPTensilica570T

CMLX4-ISA_E

LocalMemoryCache

tAN

tNMc

0 0xCCA

1 0x00B

2 0x0FA

3 0x1FD

4 0xDE1

5 0x0ED

6 0x00E

7 0xD0A

tNAtMcN

tMcM

tMMc

tAPP

Problem:Many round-trips for key lookups

Approach:“Teach B-trees to the memory controller“

23

Intelligent Main Memory Controller (iDMA)

23

NoC

Core Core Core

PointerChaser

APP CM

Memory

MicronDDR2SDRAM

LocalMemoryLocalMemoryLocalMemory

MemoryController

APP CM

MemoryControllerSynopsis

LocalMemoryCache

MemoryControllerSynopsysDWC

DDR2

0 0xCC6

1 0x000

2 0x0F0

3 0x1FD

4 0xDE1

5 0x0ED

6 0x00E

7 0xD0A

tCNtNC tMcM

tMMc

tNP

tPN

tPMc

tMcP

Vision (and first simulations)• Intelligent memory controller• Is aware of the semantics of

memory layout• Implements core operations (e.g. lookup)

Implementation (no yet in silicon)• 0,183mm² PE with 200Mhz

24

First iDMA Design

25

Evaluation using Simulator

26

Summary

 HARDWARE FOUNDATION

 EXTENSIONS FOR PROCESSING ELEMENTS

 INTELLIGENT DMA CONTROLLER

Overview on Hardware Optimizations for Database EnginesAnnett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner

BTW 2017, Stuttgart, Germany, 2017-03-09

top related