ukoug15 simd outside and inside oracle 12c (12.1.0.2)

45
SIMD Instructions outside and inside Oracle 12c Laurent Léturgez – 2015

Upload: laurent-leturgez

Post on 21-Jan-2017

4.084 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD Instructions outside and inside Oracle 12cLaurent Léturgez – 2015

Page 2: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

ABOUT ME

Oracle Consultant since 2001 Former developer (C, Java, perl, PL/SQL) Blogger since 2004

http://laurent.leturgez.free.fr (In french and discontinued)

http://laurent-leturgez.com Twitter : @lleturgez OCM 11g

Page 3: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 4: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Caveats

Most of the topics are from My own researches My past life as a developer

Some of the topics are about internals, so: Analysis and conclusion may be incomplete Future versions of Oracle may change the features

Tests have been done with Oracle 12.1.0.2, Oracle Enterprise Linux 7.1, VMWare Fusion 7 (And VirtualBox)

Page 5: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Before we start …

Some fundamentals (from Dennis Yurichev’s book) CPU register : […]The easiest way to understand a register is

to think of it as an untyped temporary variable. Imagine if you were working with high-level PL1 and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these!

Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA).

Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer’s life easier.

http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf

Page 6: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 7: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c SIMD stands for Single Instruction Multiple Data

Process multiple data In one CPU instruction

Based on Specific registers Specific CPU instructions and sets of instructions

Not Oracle specific CPU Architecture specific

Intel IBM (Altivec) Sparc (VIS)

This presentation is mainly about Intel architecture

Page 8: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c What is a SIMD register ?

It’s a CPU register Wider than traditional registers (RDI, RSI, R8, R9 etc.)

128 up to 512 bits wide Contains many data

Page 9: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c Scalar operation

an array of 4 integers {1,2,3,4} add 1 to each value

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

2

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

2

2

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

4

1

5

3 4 52

…/…

LOAD ADD SAVE4 LOAD4 ADD4 SAVE

Page 10: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c SIMD operation

an array of 4 integers {1,2,3,4} add 1 to each value

SIMD Reg1

CPU

RAM

In

Out

2 3 41

1 1 11SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

2 3 41

1 1 11SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

2 3 41

1 1 11

3 4 52

SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

3 4 52

2 3 41

1 1 11

3 4 52

SIMD Reg2SIMD Reg3

LOAD ADD SAVE

Page 11: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c

Instruction set

MMX SSE SSE2/SSE3/SSSE3/SSE4

AVX/AVX2

AVX3 or AVX512

Register Size

64 Bits 128 bits 128 bits 256 Bits 512 bits

# Registers 8 8 16 16 32

Register Name

MM0 to MM7 XMM0 to XMM7 XMM0 to XMM15

YMM0 to YMM15

ZMM0 to ZMM31

Processors Pentium II Pentium III Pentium IV to Nehalem

Sandy Bridge - Haswell

Skylake

Other Only four 32 bits single precision floating point numbers

Usage expansion (two 64 bits double precision, four 32 bits integers and up to sixteen 8 bits bytes)

Three operand instructions (non destructive) : A+B=C rather than A=A+B

Alignements requirements relaxed

Page 12: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … outside Oracle 12c

Intel API (C/C++) : Intel Intrinsics Guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Sample code: https://app.box.com/simdSampleC-2015

Page 13: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 14: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Will my application use SIMD registers and instructions ?

It depends on : Hardware

Consult processors datasheets to see which instruction set extensions are used (if many)

http://ark.intel.com/#@Processors Hypervisor

Some (old) hypervisors do not support modern extensions VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2 Hyper-V on W2008R2-SP1 needs patch for specific

processors to support AVX

Page 15: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

It depends on the Operating System AVX (256 bits) is supported from

Linux Kernel >= 2.6.30 Redhat EL5 : 2.6.18 Oracle EL5 w/UEK : 2.6.32

AVX needs xsave kernel parameter

Solaris 10 upd 10 and Solaris 11 Windows 2008 R2 SP1

Will my application use SIMD registers and instructions ?

Page 16: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

It depends on the compiler GCC

> 4.6 for AVX support Use of specific switches (-msse2, -msse4.1,

msse4.2, -mavx, -mavx2 …) Intel C/C++ Compiler (ICC)

> 11.1 for AVX Support and > 13.0 for AVX2 support

Use of specific switches (-xsse4.2, -xavx, -xcore-avx2 …)

Beware of optimization switches (-O1,-O2, -O3) More … disassemble (if you are allowed to )

Registers Assembler instructions

Will my application use SIMD registers and instructions ?

Page 17: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 18: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Based on a C program Used CPU: Haswell microarchitecture (Core

i7-4960HQ). AVX/AVX2 enabled 3 tests : No SIMD, SSE4, AVX Input: one array containing 1Million values. Goal: Add 1 to each value, each million

values repeated 4k, 8k, 16k and 32k times CPU Time(s) = f(#rows)

“Quick and Dirty” Sample code available here: https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v

Raw performance

Page 19: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Raw performance

4096 M. rows 8192 M. rows 16384 M. rows 32768 M. rows0

10

20

30

40

50

60

70

80

90

10.35

20.46

42.35

85.64

3.36.81

13.73

25.58

1.96 3.517.23

15.15

RAW Performance (CPU) for SIMD Instructions

NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)

CPU

Tim

e (S

ec)

Page 20: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 21: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c

In Memory Data Structure In Memory Compression Unit :

IMCU IMCU is the unit of column store

allocation Target size is 1M rows

(controlled by _inmemory_imcu_target_rows) One IMCU can contain more than

one column Each column in one IMCU is a

column unit (CU)

Page 22: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c In memory column store storage indexes

For each column unit, min and max values are maintained in a storage index

Storage Indexes provide CU pruning

Information about CU available in GV$IM_COL_CU (Undocumented. See Bug ID 19361690)

IMCU Pruning

Page 23: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c

The way your data is sorted matters for best IMCU pruning

Page 24: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c SIMD extensions are used with In Memory storage

indexes for efficient filtering1. IM Storage Indexes do IMCU pruning2. SIMD instructions apply efficiently filter predicates

IMCU Pruning

Prod-id1010141410

Filtering with SIMD

Page 25: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c Oracle 12c uses specific libraries for SIMD (and

compression) Located in $ORACLE_HOME/lib

libshpksse4212.so for SSE4.2 extensions Compiled with ICC v12 with specific xsse4.2 switch

libshpkavx12.so for AVX extensionsCompiled with ICC v12 with specific xavx switch

libshpkavx212.so for AVX2 extensionsNot yet implemented (8 functions implemented)

No ICC avx2 switch used because ICC v12 doesn’t support AVX2 Thanks Tanel Pöder

Page 26: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c Oracle SIMD related functions

Located in kdzk kernel module (HPK) Part of Advanced Compression library (ADVCMP)

Easily tracked with systemtap

Demo #1

Page 27: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c How Oracle uses SIMD extensions ?

It depends on many parameters OS Level : /proc/cpuinfo

AVX and AVX2 support

SSE4 Support only

Page 28: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c Which library am I using ?

pmap AVX support

SSE4 support

Page 29: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c Which compiler options have been used ?

Read “comment” section in ELF

Read the corresponding compiler documentation

[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’

[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0 Build 20120731

…/…

-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx

Page 30: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c How are SIMD registers used by Oracle ? GDB

To get the call stack (backtrace) To set breakpoints on interesting functions To view register contents (traditional and SIMD)

“Info registers” for traditional registers “Info all-registers” for all registers (SIMD reg included) (gdb) print $ymmX.<format> Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32, v4_int64, or v2_int128

Demo #2

Page 31: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c

In red, register content has been modified

In blue, the second part of the SIMD registers (128 bits) is empty

Page 32: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c Oracle IM can use AVX or SSE4 extensions for SIMD

operations When AVX is used

It uses only 128 bits out of 256 bits wide registers• AVX adds new register-state through the 256-bit wide

YMM register file• Explicit operating system support is required to

properly save and restore AVX's expanded registers between context switches

• Without this, only AVX 128-bit is supported

Page 33: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

SIMD instructions … inside Oracle 12c The culprit

Oracle 12.1.0.2 is supported from EL5 onwards

EL5 Redhat Kernel is 2.6.18 and this flag (xsave) is supported from 2.6.30 kernels

For compatibility reasons, Oracle has to compile its code on 2.6.18 kernels

Page 34: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Agenda

SIMD Instructions, outside Oracle 12c What is a SIMD instruction ? Will my application use SIMD ? Raw Performance

SIMD Instructions, inside Oracle 12c How SIMD instructions are used inside Oracle

12c Tracing SIMD in Oracle 12c

Page 35: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c Oradebug has 2 components related to IM

Page 36: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c Interesting components to trace for

SIMD and/or IMCU Pruning are : IM_optimizer

Gives information about CBO calculation related to IM

ADVCMP_DECOMP.* ADVCMP_DECOMP_HPK : SIMD functions ADVCMP_DECOMP_PCODE : Portable Code

Machine (usually comparison functions and results)

Page 37: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c IM_optimizer

Information available in trace file IMCU Pruning ratio CU decompression costing (per IMCU) Predicate evaluation costing (per row)

Statement has to be parsed to get results

Page 38: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12cselect prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;

Page 39: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c This information is available in CBO trace file (10053 or

SQL_costing event)

Page 40: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c ADVCMP_DECOMP

ADVCMP_DECOMP_HPK

Information is available in the trace file (for each IMCU processed)

Used library and function Number of rows and counting algorithm Processing rate (comparison and decompression if relevant) But nothing on the results of the processing

Page 41: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c ADVCMP_DECOMP

ADVCMP_DECOMP_HPK Gives information about SIMD function usage and filtering

(after IMCU pruning) Example: inmemory table with NO MEMCOMPRESS or DML

compression

Page 42: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c ADVCMP_DECOMP

ADVCMP_DECOMP_HPK Example: inmemory compressed table SIMD are used only in the kdzk_eq_dict functions

Page 43: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c My thoughts about compression/decompression

NO MEMCOMPRESS / COMPRESS FOR DML kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit,

kdzk_le_dynp_32bit etc.) FOR QUERY LOW / QUERY HIGH

Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit, kdzk_le_dict_4bit etc.)

Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)

Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)

Page 44: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

Tracing SIMD in Oracle 12c My thoughts about compression/decompression

FOR CAPACITY LOW FOR QUERY LOW + additional proprietary compression (OZIP) Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex:

kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.) FOR CAPACITY HIGH

FOR QUERY HIGH + heavy weigth compression algorithm

Compression/decompression method depends on: Datatype Column Compression Unit size Column contents

Page 45: Ukoug15 SIMD outside and inside Oracle 12c (12.1.0.2)

[email protected]

http://laurent-leturgez.com

@lleturgez