journey of pixels in adobe photoshop on intel hd graphics · pdf filejourney of pixels in...

46
Journey of Pixels in Adobe Photoshop on Intel HD Graphics Murali Madhanagopal(Intel), Jerry Harris(Adobe), Yuyan Song(Adobe), Joseph Hsieh (Adobe)

Upload: hoangminh

Post on 28-Mar-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Journey of Pixels in Adobe Photoshop on Intel HD Graphics Murali Madhanagopal(Intel), Jerry Harris(Adobe), Yuyan Song(Adobe), Joseph Hsieh (Adobe)

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Any code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user.

Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps.

Performance claims: Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.Intel.com/performance

Intel, Intel Inside, the Intel logo, Centrino, Intel Core, Intel Atom, Pentium, and Ultrabook are trademarks of Intel Corporation in the United States and other countries

Legal

2

Agenda

Photoshop performance on Intel HD Graphics – Murali Madhanagopal

Photoshop GPU Usage – Jerry Harris

Photoshop Blur Gallery – Yuyan Song

OpenCL in Photoshop Creative Cloud – Joseph Hsieh

Conclusion

Demos

3

Processor Graphics Brief History Integrated Graphics since late 90’s

Focused on enterprise and entry level consumer graphics

Focus on performance has increased significantly over the past few years

MSS leader for the past several years

54.8% MSS for desktop and 64.6% MSS for notebooks as of Q3’12

Annual volume of >280Mu, ramp of >150Mu in yr of launch

CPU + Gfx on same package since 2010, same die since 2011

Source: John Peddie Research, Q3’12 4

Graphics Strategy -the Change Past Present

1 die Eg: G45, Core i5/i3

Multi-die strategy 4th Gen Core i7/5/3 products will have 3 dies – GT1/GT2/GT3

(n-1) fab technology CPU on 32nm, GPU on 45nm in 2010 CPU on 45nm, GPU on 65nm in 2009

n generation fab technology CPU and GPU on 32nm in 2011, 22nm in 2012 CPU and GPU on the same fab technology going forward

No power sharing Turbo power sharing Dynamically shift voltage and freq between CPU and GPU based on demand to maximize perf/watt for CPU and GPU based work

Less die area for Gfx Greater die area for Graphics Nearly doubling transistor count every year

(n-1) API support n generation API support

Media capabilities Game changing Media capabilities Decode, Image processing, Encode

Chipset integration CPU integration LLC cache sharing (8MB) for fast latency access to Gfx data

Higher performance, better features and TTM execution

5

Workstation pGFX Gen-to-Gen Comparison

E3 v3 (HSW) Broadwell

14nm GPU & CPU 22nm GPU & CPU

2014:

TBD

E3 v2 (IVB)

22nm GPU & CPU

2012: 32 GB sys mem, up to 1.5 GB

allocated as video RAM

DX 11.0 OCL 1.1 OGL 4.0

GT2: 16 EUs

2013: 32 GB sys mem , up to 1.5 GB

allocated as video RAM

DX 11.1 OCL 1.2 OGL 4.1

GT2: 20 EUs GT3: 40

Broadwell adds significant improvements for GFX applications in 2014

E3 (SNB)

32nm GPU & CPU

2011: 32 GB sys mem, up to 1.5 GB

allocated as video RAM

DX 10.1 OGL 3.2

GT2: 12EUs

6

Photoshop Gen to Gen GPU Features

CS5 Creative Cloud (CC)

OpenCL: Smart Sharpen

Modes:

Use Graphics Processor

Use OpenCL

GPU enabled UI Pixel Bender plugin

Open GL:

Scrubby Zoom HUD Color Picker

Color Sampling Ring Repousse

3D Overlays

OpenGL Modes: Basic:

Normal: Advanced:

CS4 GPU Canvas/3D

interactions

Open GL: Smooth Zoom

Panning Canvas Rotate

Pixel Grid 3D Axis/Lights

Modes:

Edit->Preferences->Enable OpenGL Drawing

Ironlake(G45, 2010) was enabled for Basic mode. SNB, IVB, HSW all support Advanced GL mode.

IVB and HSW support OpenCL acceleration in Photoshop CS6/CC.

CS6

Content Editing

OpenGL: Adaptive Wide Angle

Liquify Oil Paint

Puppet Warp Lighting Effects

3D Enhancements

OpenCL: Field/Iris/Tilt Shift

Blur

7

Photoshop OGL performance

Photoshop performance scales with EU’s and memory bandwidth!

GPU Utility Intel® HD Graphics P3000 (SNB GT2)

Intel® HD Graphics P4000 (IVB GT2)

Intel® HD Graphics P4600 (HSW GT2)

Intel® Iris™ Pro Graphics 5200

Birds Eye View test 60.44 88.29 89.98 187.54

Hand Toss Test 58.65 66.75 66.91 79.08

Paint Brush Size 300 41.07 48.88 52.29 49.41

Paint Brush Size 500 37.90 47.35 43.93 46.20

Rotate Test 67.20 98.94 79.85 92.61

Scrubby Zoom Test 65.09 51.26 102.12 202.78

Smooth Zoom Test 60.85 94.35 85.83 204.41

Averages (fps) 52.69 65.51 77.69 135.69

24% 19% 74%

* Photoshop CC-64 on Win7-64, 16gb 1600Mhz DDR3 8

Performance (contd)

* Liquify and Blur processing time in seconds normalized to 1 for GPU acceleration off on HSW GT2

Liquify filter shows a 2.5X improvement and Field Blur 6X improvement with GPU acceleration

0 1 2 3

Liquify Filter

Photoshop CC OpenGL

OpenGL On

OpenGL Off

9

0 2 4 6 8

Field Blur

Photoshop CC OpenCL

OpenCL On

OpenCL Off

What is happening under the hood?

GPUView is a Microsoft tool showing cpu-gpu interaction

Both CPU and GPU are efficiently utilized

Photoshop is multithreaded and many CPU cores submit work to the gpu

GPU is mostly busy at 70%-98% utilization. EU Utilization is 85%.

Memory Utilization is at 36% for GT2 and goes down to 10% with GT3e

No stalls in the GPU pipeline.

* Liquify filter applied to 60 mb image 10

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Photoshop GPU Usage Jerry Harris | Principal Scientist - Photoshop

11

© 2012 Adobe Systems Incorporated. All Rights Reserved.

User Expectations – Mobility World – aka Post WIMP

12

© 2012 Adobe Systems Incorporated. All Rights Reserved.

User Expectations – Pre WIMP

13

© 2012 Adobe Systems Incorporated. All Rights Reserved.

NUI Evolution - iEnvy moves to the desktop

14

© 2012 Adobe Systems Incorporated. All Rights Reserved.

NUI Evolution - iEnvy moves to the desktop

15

© 2012 Adobe Systems Incorporated. All Rights Reserved.

NUI Evolution - iEnvy moves to mobile x86

16

© 2012 Adobe Systems Incorporated. All Rights Reserved.

NUI Evolution - iEnvy moves to mobile x86

17

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Photoshop Use Cases

18

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Simple Photoshop Layer structure

Virtual Tiled array of planar components

8, 15, and 32f values

0..64 possible channels

Unassociated Alpha (not preweighted)

Sheet Mask == Alpha

Other masks include User mask, and clipping path

Other layers include placed content

Smart Docs …Other PS docs

PDF

3D, Type, Shapes

Movies

19

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Photoshop Layer Stack View Updating (Simplified)

Array of layers

Composite of layers below target layer cached

Layers above the current layer target recomposited

Closest Pyramid level to the view scale is composited

Update occurs one tile at a time

Some edits occur at the max pyramid level

Some previews apply edits at viewing level

20

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Photoshop GL Texture Usage- Data Structure

Sparse Tiled Pyramid

Tiles are 360x360

Texture2D

Advanced Mode – int format –GL_RGBA16F_ARB

Shader for Checker board compositing

Shader for Tone mapping

Shader for Color Matching

Non-Advanced – int format GL_RGBA8

Pixels are Screen Ready

21

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Gl Canvas- User visible features

Smooth zoom animations

Smooth canvas toss animations

Canvas Rotation

Temporal refinement when panning large images (blurry to sharp)

Pixel Grid

Improved filtering when at non-powers of two zoom levels

Antialiasing of Paths, Shapes, and Overlays

HDR Tone Mapping

ACE Color Matching

Brush Resizing feedback

22

© 2012 Adobe Systems Incorporated. All Rights Reserved.

GPU Tiled Pyramid Update Management

Just prior to redraw – load as many tiles at the level of the pyramid that best matches the

viewing conditions - followed by a rendering + swapbuffers.

Idle time – Back fill pyramid with no updates

When a navigation modality is engaged start prefetching

Pan Tool – Around the periphery of the current view frustum

Zoom Tool – Above and/or below the current view frustum

23

© 2012 Adobe Systems Incorporated. All Rights Reserved.

The toll for Full screen immersion

24

2650x1600 monitor in full screen mode at 66.7% 4016 x 2424 8 Bit - 200 megs per frame = 12 gigs per second 16 bit – 400 megs per frame = 24 gigs per second 32 bit – 800 megs per frame = 48 gigs per second

3840 x 2160 monitor in full screen mode at 66.7% 5818 x 3272 8 Bit – 400 megs per frame = 24 gigs per second 16 bit – 800 megs per frame = 48 gigs per second 32 bit – 1600 megs per frame = 96 gigs per second

5 passes over data - COW – Modify – Composite – Interleave - Upload

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Future Directions

Avoid reincarnating the pixel data.

Adopt the INTEL_map_texture extension

Exploit OpenGL/OpenCL interopt

25

© 2012 Adobe Systems Incorporated. All Rights Reserved.

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Photoshop Blur Gallery Yuyan Song, Computer Scientist, Adobe Systems Inc.

27

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Why OpenCL

Only cross-platform GPU computing solution Advantages over OpenGL Learning curve Data format Debugging

Increasing maturity and ubiquity

28

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Blur Gallery Demo

Field Blur Iris Blur Tilt Shift

29

© 2012 Adobe Systems Incorporated. All Rights Reserved.

How did we do it?

OpenCL kernels were ported from optimized CPU code Broken into 2K x 2K blocks for GPU Use 1K x 1K scale down image for mouse-down interaction

30

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Challenges

Need good candidate algorithms Bandwidth Compute Parallel

Need debugged C algorithm first

31

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Challenges

Issue in using multiple command queues in multiple threads Resource limits Win/Mac Timeout issues on low end cards. Out of memory issues on low end cards.

Platform variation Driver Issues Various compiler issues

32

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Performance Comparison Systems of standard configuration show 4-8x gain for typical use-cases Gains improve with Blur radii (results from Intel® Iris™ Pro Graphics 5200

running Windows 7 listed below) General application processing accounts for majority of time in smaller workloads

33

Radius in Pixels (21 mp image)

OpenCL on OpenCL off

100 5.2s 19.6s

250 7.8s 53.8s

500 17.3s 120.8s

© 2012 Adobe Systems Incorporated. All Rights Reserved.

OpenCL in Photoshop CC Joseph Hsieh, Computer Scientist II, Adobe Systems Inc.

34

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Renovated Smart Sharpen Feature

Adobe renovated the legacy smart sharpen by introducing patch based denoise and sharpen algorithm. New patch based algorithm produces sharpened image

without halo effect. Furthermore, the denoise step suppress the “noise get boosted when you sharpen” issue.

35

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Renovated Smart Sharpen Feature

36

Original Picture

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Renovated Smart Sharpen Feature

37

After Applying the Legacy Smart Sharpen

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Renovated Smart Sharpen Feature

38

After Applying the Patch Based Smart Sharpen

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Challenges

Our patch based denoise algorithm is heavily memory bound. We can not cache portion of input image to local memory.

39

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Attempted Solutions

40

1. For each pixel, is there redundant comparison part in up to 65 patches comparison?

2. Maybe using local memory in some way to release the stress on global memory.

3. The intuitive approach of using read_imagef() is slower then high end CPUs on lower end GPUs (even with 90%+ cache hit).

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Benchmark

41

7x faster than the CPU standalone depends on this video adaptor. But…

3.1 s 3.8 s 3.4 s

5.4 s

3.7 s

8.4 s

OCL On OCL Off

Intel HD P4600(gpu) VS Xeon E3-1285 3.6GHz (cpu)

Win 8 with Blur Radius 0 using 5616x3744 (21M pixels) RGB Image

Denoise amount:

1% 10% 100%

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Benchmark

42

OCL pipeline is not optimized yet in this version. From the chart, we can see by using OpenCL on denoise, it basically already nullify the performance impact of denoise step (~ 1 s)

3.1 s 3.8 s 3.4 s

5.4 s

3.7 s

8.4 s

OCL On OCL Off

Intel HD P4600 VS Xeon E3-1285 3.6GHz

Win 8 with Blur Radius 0 using 5616x3744 (21M pixels) RGB Image

Denoise amount: 1% 10% 100%

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Ongoing Work

Full smart sharpen feature pipeline on OpenCL. Apply OpenCL to other features.

43

© 2012 Adobe Systems Incorporated. All Rights Reserved.

Conclusion Xeon E3v3/HSW GT2 (Intel® HD Graphics 4600) Photoshop performance

at entry level professional card. Intel® Iris™ Pro Graphics graphics at 48w performance compares to mid level discrete card.

HSW Ultrabooks at 15w can handle mid sized images - 21MP. OpenCL makes it possible for Adobe to introduce more advanced features

to our customers without compromise of user experience and responsiveness of our tools.

Intel and Adobe are working together to enable Photoshop to travel with you in the new mobile form factors.

Demos Smartsharpen Filter

Liquify Filter

Adaptive Wide Angle Correction Filter

46