ima workshop minneapolis, mn january 9—16, 2003 automatic differentiation and its role in...

Download IMA Workshop Minneapolis, MN January 9—16, 2003 Automatic Differentiation and Its Role in Simulation-Based Optimization

If you can't read please download the document

Upload: florence-katrina-randall

Post on 18-Jan-2018

220 views

Category:

Documents


0 download

DESCRIPTION

Group Members Uwe Naumann—graph theory Boyana Norris—software engineering (Beata Winnicka—compiler technology) Arno Rasch—visiting from Aachen Alumni: Abate, Bischof, Griewank, Khademi, Kim, Roh Funding: DOE, NASA, NSF

TRANSCRIPT

IMA Workshop Minneapolis, MN January 916, 2003 Automatic Differentiation and Its Role in Simulation-Based Optimization Outline Black Box approaches (NAND) Gray Box approaches (NAND) All-at-once methods (SAND) Research agenda Group Members Uwe Naumanngraph theory Boyana Norrissoftware engineering (Beata Winnickacompiler technology) Arno Raschvisiting from Aachen Alumni: Abate, Bischof, Griewank, Khademi, Kim, Roh Funding: DOE, NASA, NSF Technique for computing analytic derivatives of programs (millions of lines of code) Derivatives used in optimization, nonlinear PDEs, sensitivity analysis, inverse problems, etc. AD = analytic differentiation of elementary functions + propagation by chain rule Associativity of the chain rule leads to two main modes: forward and reverse Can be implemented using source transformation or operator overloading AD in a Nutshell Black Box Methods Apply AD to a complete simulation Use derivatives (gradients) for optimization Examples NASA Langley MDO Fluent Sea Ice Other applications: atmospheric chemistry, water reservoir simulation, breast cancer modeling, semiconductor device simulation Multidisciplinary Design Optimization of Airfoils NASA Langley funded ADIFOR and ADIC development to support their MDO efforts Requirements: differentiated versions of Surface grid generator (F77) CSCMDO volume grid generator (C) CFL3D Navier-Stokes solver (F77) Other structural analysis, CFD apps In contrast to previous efforts, AD enabled incorporation of a turbulence model and accurate grid sensitivies Plot shows sensitivity of z grid coordinate w.r.t. change in root chord Automatic Tuning of Sea Ice Model Parameters Objective: Develop a methodology for automatically tuning model parameters for which an exact value is not known. Approach: Pose this as an inverse problem: find parameter values that minimize the difference between observations and simulation results Use a bound-constrained optimization algorithm, initially a quasi-Newton method Use automatic differentiation to compute the analytic derivatives More robust and often faster than finite differences Less time-consuming, error-prone than hand-coding Normalized parameters: (C w /0.0055, C a /0.0012, P * /27500, D 1 /2.284, /25 o ) Generated by: (C w =0.0055, C a =0.0012, P * =27500, D 1 =2.284, =25 o ) Simulated observational data Tuned parameters: (C w =.0846, C a =.0031, P * =50490, D 1 =3.594, =53 o ) Starting points: (C w = , C a =0.0063, P * =2601, D 1 =0.151, =2.7 o ) IABP buoy data 1) Randomly selected starting parameters: (C w = , C a =0.0063, P * =2640, D 1 =.249, =1.7 o ) 2) Tuned parameters: (C w =.0846, C a =.0031, P * =50490, D 1 =3.594, =53 o ) 3) Standard parameters: (C w =0.0055, C a =0.0012, P * =27500, D 1 =2.284, =25 o ) Parameter sets Tuned parametersStandard parameters - Simulated (yellow) and observed (green) average ice drift velocity Ice drift Tuned parametersStandard parameters - Simulated (yellow) and observed (green) average ice drift speed (cm/sec) Contours of ice drift speed Tuned parametersStandard parameters - Simulated (yellow) and observed (green) March ice thickness (m) Ice thickness distribution Fluent V4.52 source code: Fortran 77, partially F90 (dyn. memory management) and C (system functions) Approx. 1,600,000 lines of F77/F90 code (670,000 non- comment), ~ 1500 files and 2400 subroutines. Approx. 16,000 lines C code. Context of Aachen Research Project (SFB540): Wavy film running down vertical wall k turbulence model (2 phase) Model parameters: c 1 , c 2 , c , , Goal: Derivatives of field values w.r.t model parameters. All in all five derivatives need to be computed. AD of Fluent FluentFluent.ADRatio AD/Original Lines of Code (w/ comments) 1,592,1881,620, Lines of Code (w/o comments) 673,774706, Number of Files Number of subroutines and functions Fluent Sample Problem Water and Air in a spinning Bowl pressure and swirl velocity Fluent Sample Problem (cont) First results (contd) Shown is value of pressure p for the first 300 grid points (2 grid lines). 1600 < p < 200. For perturbed results value of c 1 = 1.44 was changed by indicated step size. Optimization during compilation was turned off. FLUENT.AD encountered overflows in derivative computation in single precision. FLUENT.AD - first results derivative of pressure w.r.t. turbulence parameters c 1 and c 2 FLUENT.AD - first results (cont) Derivative of velocity and swirl velocity w.r.t. c 2 Resource Requirements Memory Usage: 32 MB for original code 168 MB for differentiated code Runtime on Intel PIII (600 Mhz): Derivatives take 7.5 times longer than running of code alone. (-o2 optimization level on PGF, no experimentation with other options yet). Resource Requirements Derivatives of pressure, u/v velocity, and swirl w.r.t. the five turbulence parameters of the k- model. Ultrasparc III 750 MHz -O Pentium III 500MHz pgf O2 FLUENT67 sec 32 Mbytes 73 sec FLUENT.AD470 sec 168 MBytes 557 sec Issues with Black Box Differentiation Source code may not be available or may be difficult to work with Simulation may not be (chain rule) differentiable Feedback due to adaptive algorithms Nondifferentiable functions Noisy functions Convergence rates Etc. Typically, only gradient-based methods are used Accurate derivatives may not be needed (FD might be cheaper) Wisconsin Sea Ice Model Lots of massaging to get code to Fortran77 standard Code prepared for ADIFOR delivers the same function values. Function values of Fluent and Fluent.AD are consistent, but lots of noise in the results: Single vs. Double precision Absoft vs. PGI Compiler inconsistency of FD approximations for different step sizes First Results with Fluent double precision p(x 0 +h)-p(x 0 ) Compiled with PGF for Linux Static version of code Values of p in [-1600,200] Influence of Perturbations on Film Problem (Aachen) single precision p(Absoft)-p(PGF) Compiled with Absoft 5.0 Absoft 5.0 PGF PGF for Linux Dynamic and static version of original code Influence of compilers (single precision) Difficulties during Fluent Preprocessing Type mismatches inconsistent number of arguments in subroutine calls while statement in subroutine testbt lots of I/O statements contain function invocations Dynamic range of derivative code often is larger than that of original code. This may lead to overflows in the derivative code, in particular in 32-bit arithmetic. Original Code: if (cendiv.eq.0.0) cendiv = zero endif axp = axp+ap/cendiv AD-generated code: r4_v = ap / cendiv r4_b = 1.0 / cendiv r5_b = (-r4_v) / cendiv do g_i_ = 1, g_pmax_ g_axp(g_i_) = g_axp(g_i_) + r4_b * g_ap(g_i_) + r5_b * g_cendiv(g_i_) enddo axp = axp + r4_v Note: The value of zero is a small number, not 0.0 Overflows in Fluent.AD Gray Box Methods Exploit knowledge of problem to increase accuracy, decrease cost Can reduce cost of g by exploiting math SensPVODE, DASPK 3.0, ad_PETSc Griewanks piggyback schemes Can reduce cost of f via grid sequencing, hot restarts, SensPVODE: Objective ODE Solver (PVODE) F(y,p) y| t=0 p y| t=t1, t2,... Sensitivity Solver y| t=0 p y| t=t1, t2,... dy/dp| t=t1, t2,... automatically Possible Approaches ad_PVODE ad_F(y,ad_y,p,ad_p) y, ad_y| t=0 p,ad_p y, ad_y| t=t1, t2,... SensPVODE y| t=0 p y, dy/dp | t=t1, t2,... ad_F(y,ad_y,p,ad_p) Apply AD to PVODE: Solve sensitivity eqns: Augmented ODE initial-value problem: PVODE as ODE + sensitivity solver SensPVODE: Test Problem Diurnal kinetics advection- diffusion equation 100x100 structured grid 16 Pentium III nodes SensPVODE: Number of Timesteps SensPVODE: Time/Timestep Differentiated PETSc Linear Solver (SLES) AD, CD More Accurate Than DD All-at-once (SAND) methods Use AD to provide J, Jv, J T v, H, for all-at-once methods Preliminary investigations with Keyes Leverage automated AD work in PETSc, TAO Driving R&D agenda Reverse mode (imperative for g, J T v) Efficient evaluation of F for partially separable F Simultaneous evaluation of Jv and J T w Preconditioning LNKS: parameter identification model problem Nonlinear diffusion PDE BVP: Parameters to be identified: (x), Dirichlet conditions in x, homogeneous Neumann in all other dimensions (so solution has 1D character but arbitrarily large parallel test cases can be set up) Objective: where is synthetic data specified from a priori solution with given (x) piecewise constant, = 2.5 (Brisk-Spitzer approximation for radiation diffusion) Automating AD (Users Perspective) User provides subdomain function (FormLocalFunction) Using AD requires: Changing: DMMGSetSNES(dmmg,FormFunction,0) call to DMMGSetSNESLocal(dmmg,FormFunctionLocal,0, ad_FormFunctionLocal, admf_FormFunctionLocal) Adding a comment of the form /* Process adiC: FormFunctionLocal */ Switch at runtime between AD and FD using the options dmmg_jacobian_ad and dmmg_jacobian_fd (equivalent options for matrix free methods exist) AD/PETSc Automation ex19 -grashof lidvelocity 100 -da_grid_x 6 -da_grid_y 6 -dmmg_nlevels 5 \ -dmmg_grid_sequence -ksp_type -pc_type ilu -pc_ilu_levels 2 -snes_mf_err \ -snes_monitor -ksp_monitor -ksp_max_it 500 -dmmg_jacobian_mf_[ad|fd]_operator Tools: ADIC 2.0 Based on Sage 3 (LLNL) / Edison Design Group Support for C, C++ Integrated support for MPI XAIF-based component system New transformation modules and runtime libraries Optimal statement level preaccumulation More coming... More sophisticated code optimization Collaborations with Texas, LLNL, GrammaTech R&D Activities Tool development Support for Fortran 95, C++ Support for mixed-language applications Single precision derivatives Algorithms/Theory research Algorithms for Hessian accumulation Algorithms for simultaneous Jv, J T w accumulation Algorithms to minimize memory traffic rather than flops Is optimal Jacobian accumulation NP complete? Accuracy/stability of accumulation algorithms Sharp bounds on the cost of Jacobian accumulation Addressing Limitations in Black Box AD Detect points of nondifferentiability, proceed with a subgradient: currently supported for intrinsic functions, but not conditional statements Support AD of object code Use gray box methods to avoid differentiating through an adaptive algorithm Reduce (relative) cost of AD derivatives Compute multiple derivatives simultaneously Use single precision Use AD only when needed Due to intrinsic functions ABS, MAX, MIN, SQRT, INT, ** Exceptions Various ways of handling Verbose reports (file, line, type of exception) Terse summary (like IEEE flags) Ignore Due to conditional branches May be able to handle using trust regions Points of Nondifferentiability Hybrid AD/FD Method (switching) Hybrid AD/FD Method (Turner-Walker) Hybrid AD/FD Method (Combination) Conclusions AD frequently used for NAND methods AD has the potential to impact SAND methods AD eliminates Derivatives are too expensive and Derivatives are too hard to code as reasons not to use analytic derivatives (may still be good reasons not to) AD has limitations Some of these limitations can be overcome Lossy Compression X X 2 3 X X 4 5 X 0 6 CPR lossy Discretize Grid SNES UserJacUserFun u*u* ad_Discretize ad_Grid ad_SNES UserTens UserJac u*, u*u*, u* ad_Discretize ad_Grid ad_SNESUserTensUserJac OT CostFunCostGrad u*u* u*u* f 00 ii ** g UserJac OT CostFun CostGrad f 00 *, u * g ad_Grid ad_Discretize uiui uiui ii UserFun cc uiui