qemu tcg enhancements for speeding-up the emulation of simd...

25
QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions Luc Michel, Nicolas Fournel and Fr´ ed´ eric P´ etrot TIMA Laboratory System Level Synthesis Group DATE’11 W8 18/03/2011

Upload: others

Post on 13-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

QEmu TCG Enhancements for Speeding-up theEmulation of SIMD instructions

Luc Michel, Nicolas Fournel and Frederic Petrot

TIMA Laboratory

System Level Synthesis Group

DATE’11 W818/03/2011

Page 2: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Outline

1 IntroductionAbout QEmuAbout SIMD instructions

2 QEmu operationThe intermediaterepresentationThe helpers

3 Improving Neon instructionstranslation

A solution to improve thetranslationIntermediate representationextension choices

4 Tests and resultsTests protocolPerformance measurement

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 2 / 25

Page 3: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

About QEmuAbout SIMD instructions

Outline

1 IntroductionAbout QEmuAbout SIMD instructions

2 QEmu operationThe intermediaterepresentationThe helpers

3 Improving Neon instructionstranslation

A solution to improve thetranslationIntermediate representationextension choices

4 Tests and resultsTests protocolPerformance measurement

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 3 / 25

Page 4: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

About QEmuAbout SIMD instructions

QEmu: a fast and portable dynamic translator

Simulation with QEmu

Open-source simulation and virtualization software,

Dynamic binary translation of the code of a targetarchitecture,

To be executed on an host architecture.

Precise goal of the present work

Accelerate the cross-execution of the Neon instructions.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 4 / 25

Page 5: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

About QEmuAbout SIMD instructions

What are SIMD instructions?

SIMD Instructions: Single Instruction, Multiple Data

Same operation on multiple data in parallel,

very efficient to optimize some algorithms: parts of mediacodecs, of radio processes, . . . ,

64 bits or 128 bits data vectors,

8, 16, 32, 64 bits data depending on the instructions.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 5 / 25

Page 6: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

About QEmuAbout SIMD instructions

Example: vadd.i16

Taken from the ARM Neon instruction set

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 6 / 25

Page 7: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

Outline

1 IntroductionAbout QEmuAbout SIMD instructions

2 QEmu operationThe intermediaterepresentationThe helpers

3 Improving Neon instructionstranslation

A solution to improve thetranslationIntermediate representationextension choices

4 Tests and resultsTests protocolPerformance measurement

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 7 / 25

Page 8: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

The intermediate representation of QEmu

The intermediate representation of QEmu

Independent intermediaterepresentation consists ofmicro-operations.

add i32

mov i32

or i32

Two steps translation

1 Target architecture code → micro-operations,

2 micro-operations → host architecture code.

Intermediate representation benefits

Independence between targets and hosts architectures.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 8 / 25

Page 9: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

Binary translation example

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 9 / 25

Page 10: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

Neon instructions translation method: the helpers

The helpers

C functions, simulate an instruction,

Compiled as a part of QEmu,

Called when translating the corresponding Neon instruction.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 10 / 25

Page 11: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

Example with a helper

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 11 / 25

Page 12: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

The intermediate representationThe helpers

Helpers overhead

Helpers overhead

Function call,

Adapting the arguments,Passing the arguments,Getting the result.

Multiple calls because each 64b/128b vector split into 32bparts

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 12 / 25

Page 13: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

Outline

1 IntroductionAbout QEmuAbout SIMD instructions

2 QEmu operationThe intermediaterepresentationThe helpers

3 Improving Neon instructionstranslation

A solution to improve thetranslationIntermediate representationextension choices

4 Tests and resultsTests protocolPerformance measurement

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 13 / 25

Page 14: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

A solution to improve the translation

The idea

Be able to take advantage of the host SIMD capabilities,

Add some SIMD micro-operations to the QEmu IR,

Translate these micro-operations to host SIMD instructions.

The practical example of this work

ARM Neon instruction set → Intel x86 MMX/SSE instruction set.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 14 / 25

Page 15: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

How to extend the IR

Choose how to extend the QEmu IR

Adding a micro-operation for each target instruction,

Keep a little IR and add only elementary micro-operations.

Our choice

Try to keep the IR as simple as possible.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 15 / 25

Page 16: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

Examples of mapping between Neon and MMX/SSE

Direct mapping between two instructions

The most favorable case,

micro-operation with the semantic of these two instructions.

Mapping between vadd.i16 (Neon) and paddw (MMX/SSE)

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 16 / 25

Page 17: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

Examples of mapping between Neon and MMX/SSE

A Neon instruction emits multiple micro-operations

The Neon instruction is not elementary,

split into several elementary micro-operations.

Translating the vsra.u32 (Neon) instruction

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 17 / 25

Page 18: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

A solution to improve the translationIntermediate representation extension choices

Examples of mapping between Neon and MMX/SSE

A micro-operation generates multiple host instructions

No equivalent for this micro-operation on the host,

micro-operation behavior reproduced with host instructions,

Harder to perform with QEmu than previous case.

The simd 128 shl i8 micro-op emits several host instructions

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 18 / 25

Page 19: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Outline

1 IntroductionAbout QEmuAbout SIMD instructions

2 QEmu operationThe intermediaterepresentationThe helpers

3 Improving Neon instructionstranslation

A solution to improve thetranslationIntermediate representationextension choices

4 Tests and resultsTests protocolPerformance measurement

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 19 / 25

Page 20: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

What kind of tests?

Unitary tests

Ensure correctness of the translation,

detect regression during the development phase.

Performance measurement

Execution time.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 20 / 25

Page 21: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Tests environment

Linux in QEmu

Minimalist Linux system,

Cross-compilation toolchain to compile some programs for thetest system.

Real BeagleBoard system

Board embedding an ARMCortex-A8 CPU with Neonextension,

Used to validate our unitary tests.

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 21 / 25

Page 22: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Performance tests

The three chosen instructions

vadd.i16,

vsra.u16,

vshl.u8.

For each instruction. . .

101 assembly functions,

containing 0% to 100% of this Neon instruction,

filled with classical instructions,

executed several times in a loop,

total execution time measured for the helpers and mappingstrategies

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 22 / 25

Page 23: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Performance tests results

0

10

20

30

40

50

60

70

80

90

100

110

0 20 40 60 80 100Rela

tive e

xecu

tion t

ime (

%)

com

pare

d t

o h

elp

ers

SIMD instructions (%)

vadd.i16vsra.u16

vshl.u8

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 23 / 25

Page 24: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Take away message

Conclusion

Results are very encouraging, but Amdahl’s law still rules

What to do next?

Extend the implementation to more SIMD instruction sets,

Probably with the help of automation tools

Call to QEmu development community

Should this approach be promoted into mainstream QEmu?

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 24 / 25

Page 25: QEmu TCG Enhancements for Speeding-up the Emulation of SIMD …adt.cs.upb.de/quf/quf11/quf2011_12.pdf · 2011-03-11 · QEmu TCG Enhancements for Speeding-up the Emulation of SIMD

IntroductionQEmu operation

Improving Neon instructions translationTests and results

Tests protocolPerformance measurement

Thanks for your attention

And now ready to answer your questions!

Luc Michel, Nicolas Fournel and Frederic Petrot QEmu TCG Enhancements for SIMD support 25 / 25