the power of c++ project austin app

The power of C++ Project Austin appAle ContentiVisual C++ | Principal Dev Manager4-001

What’s AustinWhy we built itC++ at workGo build amazing apps!

Diving deep into project Austin

Austin

Austin is a digital note-taking app for Windows 8• You can add pages to your notebook, delete

them, or move them around

• You can use digital ink to write or draw things on those pages

• You can add photos from your computer, from SkyDrive, or directly from your computer's camera

• You can share the notes you create to other Windows 8 apps such as Email or SkyDrive

Beautiful and simple

Austin: just a pen and a piece of paper

Austin: why we built it

We used Visual C++ 2012 to build an amazing app:• Written in “modern C++”• DirectX, XAML for UI• C++/CX to interact with WinRT• Auto-vectorizer for faster ink smoothing • C++ AMP for faster page curling• …and it was fun (the code is available on

codeplex, too)

Showcase the power of Windows 8, the native platform and C++

http://austin.codeplex.com/

Modern C++DirectX and XAML UIC++/CX layer

Modern C++

We strived to write Austin in a “modern” way:• C++ Standard Library, augmented with PPL and

Boost• Smart pointers instead of raw pointers• Pervasive RAII pattern• Handle errors using C++ exceptions• Coding conventions inspired by Boost

No bare pointers, no delete

http://msdn.microsoft.com/library/hh279654(v=vs.110).aspx

DirectX and XAML

• DirectX to create an immersive, fluid user interface,that's built as a 3D scene with lights, shadows, and a camera

• On the DirectX render target, we draw notebook's pages, photos, ink strokes, and background

• A 3D engine library abstracts some of the DirectX complexity

DirectX for a fast, fluid, real-to-life experience

• XAML UI is used for the settings menu, the app bar, and the rest of the user interface

• The SwapChainBackgroundPanel to host the 3D scene inside the XAML UI page

http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-blogs-components-weblogfiles/00-00-00-65-69/6763.image_5F00_3.jpg

C++/CX

C++/CX is used at the “boundary”, to interact with Windows, (via the WinRT objects) and to leverage XAML UI• Used for loading and saving images, file picker,

camera, storage files and folders (SkyDrive, etc.), implementing the “share” contract

• Very useful for XAML UI: UI elements and events hook-ups

• We were careful in not having C++/CX code “bleed” too much in our Standard C++ code (15 files out of 350)

Windows is the RunTime

Ink smoothing and auto-vectorizer

Ink smoothing: the problem

We have in the order of 5ms or less to smooth the strokes

In real time, please…

for (int j=0; j<numPoints; j++){ float t = (float)j/(float)(numPoints-1); smoothedPressure[j] = (1-t)*p2p + t*p3p; smoothedPoints_X[j] = (2*t*t*t - 3*t*t + 1) * p2x + (-2*t*t*t + 3*t*t) * p3x + (t*t*t - 2*t*t + t) * L*(p3x-p1x) + (t*t*t - t*t) * L*(p4x-p2x); smoothedPoints_Y[j] = (2*t*t*t - 3*t*t + 1) * p2y + (-2*t*t*t + 3*t*t) * p3y + (t*t*t - 2*t*t + t) * L*(p3y-p1y) + (t*t*t - t*t) * L*(p4y-p2y);}

Ink smoothing: the codeThe C++ compiler is obsessed with optimization: In this case, it will auto-vectorize the loop

Auto-vectorizer (super simplified view)

B[0] B[1] B[2] B[3]

A[0] A[1] A[2] A[3]

A[0] + B[0] A[1] + B[1] A[2] + B[2] A[3] + B[3]

+

xmm1

xmm0

“addps xmm1, xmm0 “

xmm1

for (i = 0; i < 1000; i++){ C[i] = A[i]+B[i]}

for (i = 0; i < 1000; i+=4){ C[i:i+3] = A[i:i+3]+B[i:i+3]}

Auto-vectorizer: info from the compilerWhen does the auto-vectorizer kick in? On the command line:• /Qvec-report:1 will report the vectorized

loops• /Qvec-report2 will report both vectorized

and non-vectorized loops, and the reason why some loops were not vectorized

• Refer to the Vectorizer and Parallelizer Messages in MSDNink_renderer.cpp(1092) : info C5001: loop vectorized

From the build output, with /Qvec-report1:

http://msdn.microsoft.com/en-us/library/jj658585.aspx




#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0; i < a.size(); ++i) { a[i] = b[i] + c[i]; }}

Auto-vectorizer: it’s not always easy

info C5002: loop not vectorized due to reason ‘501’

#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0; i < a.size(); ++i) { a[i] = b[i] + c[i]; }}


#include <vector>

void test1(){ std::vector<int> a(100000), b(10000), c(10000); for (int i = 0, int iMax = a.size(); i < iMax; ++i) { a[i] = b[i] + c[i]; }}


info C5001: loop vectorized

• For the ink-smoothing algorithm, we got a 30% speed-up

• For the first part of the page curling algorithm, we got a 175% speed-up

• Auto-vectorizer can analyze very complex loops• Always measure with a profiler to understand

which loops you need to speed up• Leverage the Vectorizer and Parallelizer

Messages guide for help

The compiler will analyze the loop and emit the right code

Auto-vectorizer at work in Austin






Page curling and C++ AMP

// pseudo-code

for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;

Normal triangleNormal = cross(vertex2Pos – vertex1Pos, vertex3Pos – vertex1Pos);

triangleNormal.normalize();

vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;}

Page curling: calculating normalsLots of triangles: we have less than 15ms to “turn a page” in real time; we need to parallelize this algorithm

C++ AMP is a good candidate, since the data size is pretty large

// pseudo-code

for each triangle{ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;



vertex1.normal += triangleNormal; vertex2.normal += triangleNormal; vertex3.normal += triangleNormal;}

Page curling: calculating normalsWe’re looping over each triangle

This set of operations is safe, because it works on a single triangle at each time, no races

But here we’re updating vertexes which are shared between triangles -> race!This algorithm only works on a single

thread

Page curling: split the loop to make it parallelizable

Calculate triangle normals

Calculate vertex normals

for each triangle Calculate triangle normals

Calculate vertex normals

for each triangle

for each vertex

cache triangle normals

First, loop for each triangle…c::array<b::float32, 2> tempTriangleNormals(3, (int)triangleCount());

parallel_for_each(extent<1>(triangleCount), [=](index<1> idx) restrict(amp){ Position vertex1Pos = triangle.vertex1.position; Position vertex2Pos = triangle.vertex2.position; Position vertex3Pos = triangle.vertex3.position;



tempTriangleNormals[idx] = triangleNormal;

});

We use C++ AMP

Same as before, we calculate the normals for

each triangle

We collect the normals into a temporary array, which stay in GPU memory

…then, loop for each vertexparallel_for_each( extent<2>(vertexCountY, vertexCountX), [=](index<2> idx) restrict(amp){ Normal vertexNormal = vertexNormalView(idx);

// go find the normals from nearby triangles vertexNormal += sumTriangleNormals(idx);

vertexNormal.normalize();

vertexNormalView(idx) = vertexNormal;});

We go over each vertex, so no races

In sumTriangleNormals, we fetch the normals from tempTriangleNormals, i.e., the temporary we kept on the GPU memory

• Running this algorithm on the GPU yields between 3x and 7x speed-ups

• CPU is now free to execute other code• Even when DirectX 11 capable GPU hardware is

not present, C++ AMP will fallback to WARP, which leverages multi-core and SSE2

Massive Parallelism with GPU and WARP

Page curling: C++ AMP at work

Key takeaways

Key takeaways• Use modern C++: RAII, r-value references,

lambdas, const, Standard C++ Libraries, Boost, other 3rd party libraries, etc.

• DirectX for fast and powerful graphics• XAML UI for standard UI elements• C++/CX to talk to Windows, to other

components and to other languages (e.g., JS)

• Auto-vectorizer and PPL to distribute work on the CPU

• C++ AMP to leverage the GPU massively parallel compute power• C++ Rocks! Go write great apps!!

Resources

• Tue/5:45/B92 OdysseyConnecting C++ Apps to the Cloud via Casablanca

• Wed/11:15/B92 OdysseyIt’s all about performance: Using Visual C++ 2012 to make the best use of your hardware

• Wed/1:45/B92 StingerDirectX Graphics Development with Visual Studio 2012

Related Sessions

• Wed/5:15/B33 CascadeDiving deep into C++ /CX and WinRT

• Thu/5:15/B92 Nexus/NormandyBuilding a Windows Store app using XAML and C++ - Photo app, the hilo project

• Fri/12:45/B33 McKinleyThe Future of C++

Related Sessions

vcblogProject Austin Part 1 of 6: IntroductionProject Austin on CodePlexAuto-Vectorizer in Visual Studio 2012C++ AMP in a nutshellParallel Patterns Library (PPL)

[email protected]

Resources

Please submit session evals on the Build Windows 8 App or at http://aka.ms/BuildSessions

http://blogs.msdn.com/b/vcblog

http://blogs.msdn.com/b/vcblog/archive/2012/09/20/10348466.aspx



http://blogs.msdn.com/b/nativeconcurrency/archive/2012/04/12/auto-vectorizer-in-visual-studio-11-overview.aspx



http://blogs.msdn.com/b/nativeconcurrency/archive/2011/09/13/c-amp-in-a-nutshell.aspxhttp:/blogs.msdn.com/b/nativeconcurrency/archive/2011/09/13/c-amp-in-a-nutshell.aspx

http://msdn.microsoft.com/library/dd492418.aspx

http://msdn.microsoft.com/library/dd492418.aspx

mailto:[email protected]

http://aka.ms/BuildSessions

ENROLL TODAY!

MICROSOFT DEVELOPER DIVISION DESIGNRESEARCH

EXPERIENCE DEVELOPMENT TOOLS AND FEATURES EARLY IN THEIR DESIGN AND DEVELOPMENT

INFLUENCE FUTURE DESIGN DECISIONS

FILL IT ONLINE AThttp://bit.ly/x6dtHt

Participate in Design Research

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Appendix

• Line must be contiguous, as well as first and second derivatives

• We approximate with the “cardinal” spline solution

• With auto-vectorizer, we get a nice 30% speed-up

Ink smoothing: the math

• Brilliant paper by Hong et. al., Turning Pages of 3D Electronic Books

• Turning a page of a physical book can be simulated as deforming a page around a cone

• Each “page” in Austin is made of a bunch

of triangles• In C++, we apply the page turning

algorithm to all triangles

• The auto-vectorizer comes to rescue again with a sweet 1.7x speed-up

Page curling: how do we turn the page

• Vertex normals are typically calculated as the normalized average of the surface normals of all triangles containing the vertex

• Using this approach, computing the vertex normals on the CPU simply involves iterating over all triangles depicting the page surface and accumulating the triangle normals in the normals of the respective vertices

• To me, the above screams “massive parallel”

Page curling: vertex normals and shading

// first calculate the triangle normalsc::array<b::float32, 2> triangleNormals(3, (int)triangleCount());c::parallel_for_each(c::extent<1>(triangleCount()), [=, &triangleNormals](c::index<1> idx) restrict(amp){ b::float32 v1PosX = vertexPositionArray(0, indexArray(2, idx[0])[0]); b::float32 v1PosY = vertexPositionArray(1, indexArray(2, idx[0])[0]); b::float32 v1PosZ = vertexPositionArray(2, indexArray(2, idx[0])[0]); b::float32 v2PosX = vertexPositionArray(0, indexArray(1, idx[0])[0]); b::float32 v2PosY = vertexPositionArray(1, indexArray(1, idx[0])[0]); b::float32 v2PosZ = vertexPositionArray(2, indexArray(1, idx[0])[0]); b::float32 v3PosX = vertexPositionArray(0, indexArray(0, idx[0])[0]); b::float32 v3PosY = vertexPositionArray(1, indexArray(0, idx[0])[0]); b::float32 v3PosZ = vertexPositionArray(2, indexArray(0, idx[0])[0]); b::float32 x1 = v2PosX - v1PosX; b::float32 y1 = v2PosY - v1PosY; b::float32 z1 = v2PosZ - v1PosZ; b::float32 x2 = v3PosX - v1PosX; b::float32 y2 = v3PosY - v1PosY; b::float32 z2 = v3PosZ - v1PosZ; // cross them b::float32 x3 = y1 * z2 - z1 * y2; b::float32 y3 = z1 * x2 - x1 * z2; b::float32 z3 = x1 * y2 - y1 * x2; NORMALIZE(x3, y3, z3); triangleNormals(0, idx[0]) = x3; triangleNormals(1, idx[0]) = y3; triangleNormals(2, idx[0]) = z3;});

Page curling: C++ AMP

the power of c++ project austin app

Documents

c build

c amp

power of windows

modern c directx

windows azure1

windows vista

c standard library

xaml uic cx layermodern