lca14: lca14-301: aarch64: media, libs and gui plans & status

14
Wed-5-Mar, 10:05am, Ragesh Radhakrishnan, James Yu and Tom Gall LCA14-301: AArch64: Media, libs & GUI plans & status

Upload: linaro

Post on 18-Nov-2014

441 views

Category:

Technology


0 download

DESCRIPTION

Resource: LCA14 Name: LCA14-301: AArch64: Media, libs and GUI plans & status Date: 05-03-2014 Speaker: Ragesh Radhakrishnan Video: https://www.youtube.com/watch?v=CqbKJyLvcqI Website: http://www.linaro.org/ Linaro Connect: http://connect.linaro.org/ Slide: https://www.slideshare.net/linaroorg/lca14-301-aarch64medialibsguiplansstatus

TRANSCRIPT

Page 1: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Wed-5-Mar, 10:05am, Ragesh Radhakrishnan, James Yu and Tom Gall

LCA14-301: AArch64: Media, libs & GUI plans & status

Page 2: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

1) Pick important libraries that have existing ARMv7 (32bit) NEON optimizations

2) Avoid creating more hand coded NEON assembler, use NEON intrinsics instead

3) Set expectations- We have to run in the model- Model is not cycle accurate

4) Push results upstream to development versions of library5) As appropriate create versions against stable library versions for use in product if requested

Porting Strategy for AARCH64

Page 3: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

● libpng - James Yu● libvpx - James Yu (VP8, VP9)● libjpeg-turbo - Ragesh Radhakrishnan● pixman - Ragesh Radhakrishnan● xfce image - Tom Gall● chromium browser - Tom Gall

Porting Strategy for AARCH64

Page 4: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Source Code - git://git.code.sf.net/p/libpng/code Supported AArch64 from version 1.6.7, Nov. 2013.Has been tested on iOS 7.Benchmark Result:Version 1.6.10beta01 [February 9, 2014]Toolchain: gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-

4.8-2014.01 - Linaro GCC 2013.11) CPU: Cortex-A8 800 MHz, single core.

libpng

Test image: goldhill.png, 720x576

Total Time Performance Few performance

lossNone NEON 50.519 s 100.00%

NEON Assembly 42.899 s 117.76% 100.00%

NEON intrinsics 44.081s 114.61% 97.32%

* Total time = decode 100 times.

Page 5: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

1. A part of Google WebM project.2. Source code - https://chromium.googlesource.com/webm/libvpx

3. Status - * Complete rewritten NEON assembly to intrinsics. * Optimized performance on ARMv7. * Post total 49 patches of VP8/VP9.

- VP8: in progress review. - VP9: posted, waiting for review. (27/Feb/2014) * In progress to run on ARMv8 architecture.

libvpx - VP8/VP9

Page 6: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Benchmark result:Version: 1.3.0 [February 26, 2014]Toolchain: gcc version 4.8.3 20140106 (prerelease) (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) CPU: Cortex-A8 800 MHz, single core.Test Video: Tears of Steel, 1080p. 12:15 mins have VP8 and VP9 format version.

libvpx - VP8/VP9

FPS Performance 9.5% performance loss of using intrinsics

instead of assemblyVP8 Decode None NEON 2.82 100.00%

NEON Assembly 13.23 469.14% 100.00%

NEON intrinsics 11.97 424.46% 90.48%

Vp9 Decode None NEON 2.22 100.00%

NEON Assembly 8.37 377.03% 100.00%

NEON intrinsics 7.56 340.54% 90.32%

Page 7: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Armv7 Android refresh: List of features from AOSP integrated to libjpeg turbo ver 1.3.Tile Decode,Color conversion rgb565 & rgb8888, backing store - AshmemStatus : Upstreaming to libjpeg turbo in progress. Source: git://git.linaro.org/people/ragesh.radhakrishnan/libjpeg-turbo.git

jpeglib decompression benchmark on pandaboard using tjbench

libjpeg-turbo

Image resolution Performance(fps)

Throughput(MP/Sec)

linaro libjpeg-turbo

3008*2000 3.8 22.8563

227*149 829.1839 28.04

AOSP jpeglib

3008*2000 1.454 8.474

227*149 285.302 9.64

Page 8: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Armv8 Port: List of jpeg decoder handcoded armv8 port, This port is tested using ARM RTSM.

Status : Decoder routines upstreamed to libjpeg-turbo Source: git://git.linaro.org/people/ragesh.radhakrishnan/libjpeg-turbo.gitBranch: libjpeg-turbo-armv8

libjpeg-turbo

# Jpeg funcitons ported Remarks

1 IDCT_Slow IDCT integer version

2 IDCT_Fast IDCT non accurate version

3 IDCT_2x2 IDCT 2x2 size reduction

4 IDCT_4x4 IDCT 2x2 size reduction

5 Color conversion routines yuv to rgb, yuv to bgr, yuv to grayscale etc

Page 9: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Pixman armv8 port: Rewriting armv7 functions to armv8.

Approach : Using Intrinsics List of functionalities and progress

Status : rewriting of Bilinear scanline funciton in progressTest Environment: Using armv8 xfce stack on ARM RTSM.

Pixman

# Main functions to be ported Remarks

1 Bilinear scanline functions 80% ported

2 Pixman composite function Pixel processing funcitons Not started

Page 10: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

OE based

git.linaro.org

Works in the model

Patches need to flow to respective upstreams

xfce image

Page 11: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

StatusChromium-24 src + 32 patches

binary builttests built (most run without problem)

Model Networking broken (VFP “upgrade”)2 Gig RAM limit dual core slow

Chromium Porting to AARCH64

Page 12: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Planlibv8 ToT enables ToT Chromium

Forward portPush upstream to Chromium community

Chromium on AARCH64

Page 13: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

Any input on next libraries?

Any libraries you’d like to see Linaro optimize?

Discussion

Page 14: LCA14: LCA14-301: AArch64: Media, libs and GUI plans & status

More about Linaro Connect: http://connect.linaro.orgMore about Linaro: http://www.linaro.org/about/

More about Linaro engineering: http://www.linaro.org/engineering/Linaro members: www.linaro.org/members