the power of c++ project austin app

38
The power of C++ Project Austin app Ale Contenti Visual C++ | Principal Dev Manager 4-001

Upload: jamal

Post on 23-Feb-2016

77 views

Category:

Documents


0 download

DESCRIPTION

The power of C++ Project Austin app. Ale Contenti Visual C++ | Principal Dev Manager 4-001. Diving deep into project Austin. What’s Austin Why we built it C++ at work Go build amazing apps!. Austin. Austin is a digital note-taking app for Windows 8 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The power of C++  Project Austin app

The power of C++ Project Austin appAle ContentiVisual C++ | Principal Dev Manager4-001

Page 2: The power of C++  Project Austin app

What’s AustinWhy we built itC++ at workGo build amazing apps!

Diving deep into project Austin

Page 3: The power of C++  Project Austin app

Austin

Austin is a digital note-taking app for Windows 8• You can add pages to your notebook, delete

them, or move them around

• You can use digital ink to write or draw things on those pages

• You can add photos from your computer, from SkyDrive, or directly from your computer's camera

• You can share the notes you create to other Windows 8 apps such as Email or SkyDrive

Beautiful and simple

Page 4: The power of C++  Project Austin app

Austin: just a pen and a piece of paper

Page 5: The power of C++  Project Austin app

Austin: why we built it

We used Visual C++ 2012 to build an amazing app:• Written in “modern C++”• DirectX, XAML for UI• C++/CX to interact with WinRT• Auto-vectorizer for faster ink smoothing • C++ AMP for faster page curling• …and it was fun (the code is available on

codeplex, too)

Showcase the power of Windows 8, the native platform and C++

Page 6: The power of C++  Project Austin app

Modern C++DirectX and XAML UIC++/CX layer

Page 7: The power of C++  Project Austin app

Modern C++

We strived to write Austin in a “modern” way:• C++ Standard Library, augmented with PPL and

Boost• Smart pointers instead of raw pointers• Pervasive RAII pattern• Handle errors using C++ exceptions• Coding conventions inspired by Boost

No bare pointers, no delete

Page 8: The power of C++  Project Austin app

DirectX and XAML

• DirectX to create an immersive, fluid user interface,that's built as a 3D scene with lights, shadows, and a camera

• On the DirectX render target, we draw notebook's pages, photos, ink strokes, and background

• A 3D engine library abstracts some of the DirectX complexity

DirectX for a fast, fluid, real-to-life experience

• XAML UI is used for the settings menu, the app bar, and the rest of the user interface

• The SwapChainBackgroundPanel to host the 3D scene inside the XAML UI page

Page 9: The power of C++  Project Austin app

C++/CX

C++/CX is used at the “boundary”, to interact with Windows, (via the WinRT objects) and to leverage XAML UI• Used for loading and saving images, file picker,

camera, storage files and folders (SkyDrive, etc.), implementing the “share” contract

• Very useful for XAML UI: UI elements and events hook-ups

• We were careful in not having C++/CX code “bleed” too much in our Standard C++ code (15 files out of 350)

Windows is the RunTime

Page 10: The power of C++  Project Austin app

Ink smoothing and auto-vectorizer

Page 11: The power of C++  Project Austin app

Ink smoothing: the problem

We have in the order of 5ms or less to smooth the strokes

In real time, please…

Page 12: The power of C++  Project Austin app

for (int j=0; j<numPoints; j++){ float t = (float)j/(float)(numPoints-1); smoothedPressure[j] = (1-t)*p2p + t*p3p; smoothedPoints_X[j] = (2*t*t*t - 3*t*t + 1) * p2x + (-2*t*t*t + 3*t*t) * p3x + (t*t*t - 2*t*t + t) * L*(p3x-p1x) + (t*t*t - t*t) * L*(p4x-p2x); smoothedPoints_Y[j] = (2*t*t*t - 3*t*t + 1) * p2y + (-2*t*t*t + 3*t*t) * p3y + (t*t*t - 2*t*t + t) * L*(p3y-p1y) + (t*t*t - t*t) * L*(p4y-p2y);}

Ink smoothing: the codeThe C++ compiler is obsessed with optimization: In this case, it will auto-vectorize the loop

Page 13: The power of C++  Project Austin app

Auto-vectorizer (super simplified view)

B[0] B[1] B[2] B[3]

A[0] A[1] A[2] A[3]

A[0] + B[0] A[1] + B[1] A[2] + B[2] A[3] + B[3]

+

xmm1

xmm0

“addps xmm1, xmm0 “

xmm1

for (i = 0; i < 1000; i++){ C[i] = A[i]+B[i]}

for (i = 0; i < 1000; i+=4){ C[i:i+3] = A[i:i+3]+B[i:i+3]}

Page 14: The power of C++  Project Austin app

Auto-vectorizer: info from the compilerWhen does the auto-vectorizer kick in? On the command line:• /Qvec-report:1 will report the vectorized

loops• /Qvec-report2 will report both vectorized

and non-vectorized loops, and the reason why some loops were not vectorized

• Refer to the Vectorizer and Parallelizer Messages in MSDNink_renderer.cpp(1092) : info C5001: loop vectorized

From the build output, with /Qvec-report1:

Page 15: The power of C++  Project Austin app

#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0; i < a.size(); ++i) { a[i] = b[i] + c[i]; }}

Auto-vectorizer: it’s not always easy

info C5002: loop not vectorized due to reason ‘501’

Page 16: The power of C++  Project Austin app

#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0; i < a.size(); ++i) { a[i] = b[i] + c[i]; }}

Auto-vectorizer: it’s not always easy

Page 17: The power of C++  Project Austin app

#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0, int iMax = a.size(); i < iMax; ++i) { a[i] = b[i] + c[i]; }}

Auto-vectorizer: it’s not always easy

info C5001: loop vectorized

Page 18: The power of C++  Project Austin app

• For the ink-smoothing algorithm, we got a 30% speed-up

• For the first part of the page curling algorithm, we got a 175% speed-up

• Auto-vectorizer can analyze very complex loops• Always measure with a profiler to understand

which loops you need to speed up• Leverage the Vectorizer and Parallelizer

Messages guide for help

The compiler will analyze the loop and emit the right code

Auto-vectorizer at work in Austin

Page 19: The power of C++  Project Austin app

Page curling and C++ AMP

Page 20: The power of C++  Project Austin app

// pseudo-code

for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;

Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos);

triangleNormal.normalize();

vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;}

Page curling: calculating normalsLots of triangles: we have less than 15ms to “turn a page” in real time; we need to parallelize this algorithm

C++ AMP is a good candidate, since the data size is pretty large

Page 21: The power of C++  Project Austin app

// pseudo-code

for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;

Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos);

triangleNormal.normalize();

vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;}

Page curling: calculating normalsWe’re looping over each triangle

This set of operations is safe, because it works on a single triangle at each time, no races

But here we’re updating vertexes which are shared between triangles -> race!This algorithm only works on a single

thread

Page 22: The power of C++  Project Austin app

Page curling: split the loop to make it parallelizable

Calculate triangle normals

Calculate vertex normals

for each triangle Calculate triangle normals

Calculate vertex normals

for each triangle

for each vertex

cache triangle normals

Page 23: The power of C++  Project Austin app

First, loop for each triangle…c::array<b::float32, 2> tempTriangleNormals(3, (int)triangleCount());

parallel_for_each(extent<1>(triangleCount), [=](index<1> idx) restrict(amp){ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;

Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos);

triangleNormal.normalize();

tempTriangleNormals[idx] = triangleNormal;

});

We use C++ AMP

Same as before, we calculate the normals for

each triangle

We collect the normals into a temporary array, which stay in GPU memory

Page 24: The power of C++  Project Austin app

…then, loop for each vertexparallel_for_each( extent<2>(vertexCountY, vertexCountX), [=](index<2> idx) restrict(amp){ Normal vertexNormal = vertexNormalView(idx);

// go find the normals from nearby triangles vertexNormal += sumTriangleNormals(idx);

vertexNormal.normalize();

vertexNormalView(idx) = vertexNormal;});

We go over each vertex, so no races

In sumTriangleNormals, we fetch the normals from tempTriangleNormals, i.e., the temporary we kept on the GPU memory

Page 25: The power of C++  Project Austin app

• Running this algorithm on the GPU yields between 3x and 7x speed-ups

• CPU is now free to execute other code• Even when DirectX 11 capable GPU hardware is

not present, C++ AMP will fallback to WARP, which leverages multi-core and SSE2

Massive Parallelism with GPU and WARP

Page curling: C++ AMP at work

Page 26: The power of C++  Project Austin app

Key takeaways

Page 27: The power of C++  Project Austin app

Key takeaways• Use modern C++: RAII, r-value references,

lambdas, const, Standard C++ Libraries, Boost, other 3rd party libraries, etc.

• DirectX for fast and powerful graphics• XAML UI for standard UI elements• C++/CX to talk to Windows, to other

components and to other languages (e.g., JS)

• Auto-vectorizer and PPL to distribute work on the CPU

• C++ AMP to leverage the GPU massively parallel compute power• C++ Rocks! Go write great apps!!

Page 28: The power of C++  Project Austin app

Resources

Page 29: The power of C++  Project Austin app

• Tue/5:45/B92 OdysseyConnecting C++ Apps to the Cloud via Casablanca

• Wed/11:15/B92 OdysseyIt’s all about performance: Using Visual C++ 2012 to make the best use of your hardware

• Wed/1:45/B92 StingerDirectX Graphics Development with Visual Studio 2012

Related Sessions

Page 30: The power of C++  Project Austin app

• Wed/5:15/B33 CascadeDiving deep into C++ /CX and WinRT

• Thu/5:15/B92 Nexus/NormandyBuilding a Windows Store app using XAML and C++  - Photo app, the hilo project

• Fri/12:45/B33 McKinleyThe Future of C++

Related Sessions

Page 32: The power of C++  Project Austin app

ENROLL TODAY!

MICROSOFT DEVELOPER DIVISION DESIGNRESEARCH

EXPERIENCE DEVELOPMENT TOOLS AND FEATURES EARLY IN THEIR DESIGN AND DEVELOPMENT

INFLUENCE FUTURE DESIGN DECISIONS

FILL IT ONLINE AThttp://bit.ly/x6dtHt

Participate in Design Research

Page 33: The power of C++  Project Austin app

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Page 34: The power of C++  Project Austin app

Appendix

Page 35: The power of C++  Project Austin app

• Line must be contiguous, as well as first and second derivatives

• We approximate with the “cardinal” spline solution

• With auto-vectorizer, we get a nice 30% speed-up

Ink smoothing: the math

Page 36: The power of C++  Project Austin app

• Brilliant paper by Hong et. al., Turning Pages of 3D Electronic Books

• Turning a page of a physical book can be simulated as deforming a page around a cone

• Each “page” in Austin is made of a bunch

of triangles• In C++, we apply the page turning

algorithm to all triangles

• The auto-vectorizer comes to rescue again with a sweet 1.7x speed-up

Page curling: how do we turn the page

Page 37: The power of C++  Project Austin app

• Vertex normals are typically calculated as the normalized average of the surface normals of all triangles containing the vertex

• Using this approach, computing the vertex normals on the CPU simply involves iterating over all triangles depicting the page surface and accumulating the triangle normals in the normals of the respective vertices

• To me, the above screams “massive parallel”

Page curling: vertex normals and shading

Page 38: The power of C++  Project Austin app

// first calculate the triangle normalsc::array<b::float32, 2> triangleNormals(3, (int)triangleCount());c::parallel_for_each(c::extent<1>(triangleCount()), [=, &triangleNormals](c::index<1> idx) restrict(amp){ b::float32 v1PosX = vertexPositionArray(0, indexArray(2, idx[0])[0]); b::float32 v1PosY = vertexPositionArray(1, indexArray(2, idx[0])[0]); b::float32 v1PosZ = vertexPositionArray(2, indexArray(2, idx[0])[0]); b::float32 v2PosX = vertexPositionArray(0, indexArray(1, idx[0])[0]); b::float32 v2PosY = vertexPositionArray(1, indexArray(1, idx[0])[0]); b::float32 v2PosZ = vertexPositionArray(2, indexArray(1, idx[0])[0]); b::float32 v3PosX = vertexPositionArray(0, indexArray(0, idx[0])[0]); b::float32 v3PosY = vertexPositionArray(1, indexArray(0, idx[0])[0]); b::float32 v3PosZ = vertexPositionArray(2, indexArray(0, idx[0])[0]); b::float32 x1 = v2PosX - v1PosX; b::float32 y1 = v2PosY - v1PosY; b::float32 z1 = v2PosZ - v1PosZ; b::float32 x2 = v3PosX - v1PosX; b::float32 y2 = v3PosY - v1PosY; b::float32 z2 = v3PosZ - v1PosZ; // cross them b::float32 x3 = y1 * z2 - z1 * y2; b::float32 y3 = z1 * x2 - x1 * z2; b::float32 z3 = x1 * y2 - y1 * x2; NORMALIZE(x3, y3, z3); triangleNormals(0, idx[0]) = x3; triangleNormals(1, idx[0]) = y3; triangleNormals(2, idx[0]) = z3;});

Page curling: C++ AMP