an overview of tsfcore roscoe a. bartlett 9211, optimization and uncertainty estimation sandia is a...

An Overview of TSFCore

Roscoe A. Bartlett

9211, Optimization and Uncertainty Estimation

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.

TSFCore SAND Reports

Get most recent copy at: Trilinos/doc/TSFCore

Nonlinear Equations : Foundation for all our Work!

Applications• Discretized PDEs (e.g. finite element, finite volume, finite difference etc.)• Network problems (e.g. Xyce)

Nonlinear Equations : Sensitivities

Related Algorithms• Gradient-based optimization

• SAND• NAND

• Nonlinear equations (NLS)• Multidisciplinary analysis

• Linear (matrix) analysis• Block iterative solvers• Eigenvalue problems

• Uncertainty quantification• SFE

• Stability analysis / continuation• Transients (ODEs, DAEs)

B. van Bloemen Waanders, R. A. Bartlett, K. R. Long and P. T. Boggs. Large Scale Non-Linear Programming: PDE Applications and Dynamical Systems, Sandia National Laboratories, SAND2002-3198, 2002

Applications, Algorithms, Linear-Algebra Software

APP : Application (e.g. MPSalsa, Xyce, SIERRA, NEVADA etc.)LAL : Linear-Algebra Library (e.g. Petra/Ifpack, PETSc, Aztec etc.)ANA : Abstract Numerical Algorithm (e.g. optimization, nonlinear solvers, stability analysis,

SFE, transient solvers etc.)

A P P In t e r f a c e

A P P

V e cM a t

A N A

P r e c o n d i t io n e r

LAL

111

1..*

1

1..*1..*1

1

1

Computes functions

Key points• Complex algorithms• Complex software• Complex interfaces• Complex computers• Duplication of effort?

Examples: Epetra_RowMatrix fei::Matrix TSF::MatrixOperator

TSFCore

A P P In t e r f a c e

A P P

V e cM a t

A N A

P r e c o n d i t io n e r

LAL

111

1..*1..*

1

Computes functions

Key points• Maximizing development impact• Software can be run on more sophisticated computers• Fosters improved algorithm development

TSFCore

TSFCore::Nonlin

Requirements for TSFCore

TSFCore should:

Be portable to ASCI platforms

Provide for stable and accurate numerical computations

Represent a minimal but complete interface that will result in implementations that are:

Near optimal in computational speed

Near optimal in storage

Be independent of computing environment (SPMD, MS, CS etc.)

Be easy to develop adapters for existing libraries (e.g. Epetra, PETSc etc.)

Example ANA : Linear Conjugate Gradient Solver

TSFCore : Basic Linear Algebra Interfaces

LinearOpVectorSpace

OpBase

Vector

MultiVector

1

columns1..*

RTOpT

rangedomainspace

An operator knows its domain and range spaces

A linear operator is a kind of operator

Warning! Unified Modeling Langage (UML) Notation!

A Vector knows its VectorSpace

<<create>>

VectorSpaces create Vectors!


LinearOpVectorSpace

OpBase

Vector

MultiVector

1

columns1..*

RTOpT

rangedomainspace

<<create>>


LinearOpVectorSpace

OpBase

Vector

MultiVector

1

columns1..*

RTOpT

rangedomainspace

A MulitVector is a linear operator!

A MulitVector has a collection of column vectors!

A MulitVector is a tall thin dense matrix

<<create>>

VectorSpaces create MultiVectors!


LinearOpVectorSpace

OpBase

Vector

MultiVector

1

columns1..*

RTOpT

rangedomainspace

<<create>>


The Key to success!Reduction/Transformation Operators • Supports all needed vector operations• Data/parallel independence• Optimal performance

LinearOpVectorSpace

OpBase

Vector

MultiVector

1

columns1..*

RTOpT

rangedomainspace

R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, Accepted to ACM TOMS, 2003

Background for TSFCore

1996 : Hilbert Class Library (HCL), [Symes and Gockenbach]

Abstract vector spaces, vectors, linear operators

2000 : Epetra, [Heroux]

Concrete multi-vectors

2001 : Trilinos Solver Framework (TSF) 0.1, [Long]

2001 : AbstractLinAlgPack (ALAP) (MOOCHO LA interfaces), [Bartlett]

Reduction/transformation operators (RTOp)

Abstract multi-vectors


createMember() : VectorcreateMembers(in numMembers : int) : MultiVectorisCompatible(in vecSpc : VectorSpace) : boolscalarProd(in x : Vector, in y : Vector) : Scalar

dim : int

TSFCore::VectorSpace

applyOp(in op : RTOpT, inout ...)

TSFCore::Vector

apply_op(inout ...)reduce_reduct_objs(inout ...)

RTOpPack::RTOpT

applyOp(in op : RTOpT, inout ...)subView(in col_rng : Range1D) : MultiVectorsubView(in numCols : int, in cols[1..numCols] : int) : MultiVector

TSFCore::MultiVector

opSupported(in M_trans) : bool

TSFCore::OpBase

1

columns1..*

«create»

«create»

domainspace

apply(in M_trans, in x : Vector, out y : Vector, in ...)apply(in M_trans, in X : MultiVector, out Y : MultiVector, in ...)

TSFCore::LinearOp

createVecSpc(in dim : int) : VectorSpace

TSFCore::VectorSpaceFactory

«create»

smallVecSpcFcty

range

VectorSpaceFactory is related to MultiVectors

VectorSpaces create Vectors and MultiVectors!

MultiVector subviews can be created!

Vector and MultiVector versions of apply(…)!

Adjoints supported but are optional!

Only one vector method!

TSFCore Details

All interfaces are templated on Scalar type (support real and complex)

Smart reference counted pointer class Teuchos::RefCountPtr<> used for all dynamic memory management

Many operations have default implementations based on very few pure virtual methods

RTOp operators (and wrapper functions) are provided for many common level-1 vector and multi-vector operations

Default implementation provided for MultiVector (MultiVectorCols)

Default implementations provided for serial computation: VectorSpace (SerialVectorSpace), VectorSpaceFactory (SerialVectorSpaceFactory), Vector (SerialVector)

Vector-Vector Operations Provided with TSFCore

namespace TSFCore {

template<class Scalar> Scalar sum( const Vector<Scalar>& v ); // result = sum(v(i))

template<class Scalar> Scalar norm_1( const Vector<Scalar>& v ); // result = ||v||1

template<class Scalar> Scalar norm_2( const Vector<Scalar>& v ); // result = ||v||2

template<class Scalar> Scalar norm_inf( const Vector<Scalar>& v_rhs ); // result = ||v||inf

template<class Scalar> Scalar dot( const Vector<Scalar>& x

,const Vector<Scalar>& y ); // result = x'*y

template<class Scalar> Scalar get_ele( const Vector<Scalar>& v, Index i ); // result = v(i)

template<class Scalar> void set_ele( Index i, Scalar alpha

,Vector<Scalar>* v ); // v(i) = alpha

template<class Scalar> void assign( Vector<Scalar>* y, const Scalar& alpha ); // y = alpha

template<class Scalar> void assign( Vector<Scalar>* y

,const Vector<Scalar>& x ); // y = x

template<class Scalar> void Vp_S( Vector<Scalar>* y, const Scalar& alpha ); // y += alpha

template<class Scalar> void Vt_S( Vector<Scalar>* y, const Scalar& alpha ); // y *= alpha

template<class Scalar> void Vp_StV( Vector<Scalar>* y, const Scalar& alpha

,const Vector<Scalar>& x ); // y = alpha*x + y

template<class Scalar> void ele_wise_prod( const Scalar& alpha

,const Vector<Scalar>& x, const Vector<Scalar>& v, Vector<Scalar>* y ); // y(i)+=alpha*x(i)*v(i)

template<class Scalar> void ele_wise_divide( const Scalar& alpha

,const Vector<Scalar>& x, const Vector<Scalar>& v, Vector<Scalar>* y ); // y(i)=alpha*x(i)/v(i)

template<class Scalar> void seed_randomize( unsigned int ); // Seed for randomize()

template<class Scalar> void randomize( Scalar l, Scalar u, Vector<Scalar>* v ); // v(i) = random(l,u)

} // end namespace TSFCore

TSFCore : Vectors and Vector Spaces

C++ code:

template<class Scalar>Scalar foo( const VectorSpace<Scalar>& S ){ Teuchos::RefCountPtr<Vector<Scalar> > x = S.createMember(), // create x y = S.createMember(); // create y assign( &*x, 1.0 ); // x = 1 randomize( -1.0, +1.0, &*y ); // y = rand(-1,1) Vp_StV( &*y, -2.0, *x ); // y += -2.0 * x Scalar gamma = dot(*x,*y); // gamma = x’*y return gamma;}

Mathematical notation:

TSFCore : Applying a Linear Operator

C++ Prototype:

namespace TSFCore { enum ETransp { NOTRANS, TRANS, CONJTRANS }; template<class Scalar> class LinearOp : public virtual OpBase<Scalar> { public: virtual void apply( ETransp M_trans, const Vector<Scalar> &x, Vector<Scalar> *y ,Scalar alpha = 1.0, Scalar beta = 0.0 ) const = 0; };}

Example:

template<class Scalar>void myOp( const Vector<Scalar> &x, const LinearOp<Scalar> &M ,Vector<Scalar> *y ){ M.apply( NOTRANS, x, y );}

Example ANA : Linear Conjugate Gradient Solver

Multi-vector Conjugate-Gradient Solver : Single Iteration

template<class Scalar> void CGSolver<Scalar>::doIteration( const LinearOp<Scalar> &M, ETransp opM_notrans ,ETransp opM_trans, MultiVector<Scalar> *X, Scalar a ,const LinearOp<Scalar> *M_tilde_inv ,ETransp opM_tilde_inv_notrans, ETransp opM_tilde_inv_trans ) const {

const Index m = currNumSystems_;

int j;

if( M_tilde_inv )

M_tilde_inv->apply( opM_tilde_inv_notrans, *R_, &*Z_ );

else

assign( &*Z_, *R_ );

dot( *Z_, *R_, &rho_[0] );

if( currIteration_ == 1 ) {

assign( &*P_, *Z_ );

}

else {

for(j=0;j<m;++j) beta_[j] = rho_[j]/rho_old_[j];

update( *Z_, &beta_[0], 1.0, &*P_ );

}

M.apply( opM_notrans, *P_, &*Q_ );

dot( *P_, *Q_, &gamma_[0] );

for(j=0;j<m;++j) alpha_[j] = rho_[j]/gamma_[j];

update( &alpha_[0], +1.0, *P_, X )

update( &alpha_[0], -1.0, *Q_, &*R_ ); }

The TSFCore Trilinos package

packages/TSFCore src

interfaces Core : VectorSpace, Vector, LinearOp etc … Solvers : Iterative linear solver interfaces (unofficial!) Nonlin : Nonlinear problem interfaces (unofficial!)

utilities Core : Testing etc … Solvers : Some iterative solvers (CG, BiCG, GMRES) Nonlin : Testing etc …

adapters mpi-base : Node classes for MPI-based vector spaces Epetra : EpetraVectorSpace, EpetraVector etc …

examples …

TSFCore::Nonlin : Interfaces to Nonlinear Problems

Supported Areas• NAND optimization• SAND optimization• Nonlinear equations• Multidisciplinary analysis• Stability analysis / continuation• SFE

Function evaluations:

NonlinearProblem

(nonsingular) State Jacobian evaluations :

Auxiliary Jacobian evaluations :

NonlinearProblemFirstOrder

State constraints andresponse functions

TSFCore::Nonlin : Interfaces to Nonlinear Problems

initialize()isInitialized() : boolset_c(in c : Vector)set_g(in g : Vector)unsetQuantities()calc_c(in y : Vector, in u[1...Nu] : Vector = NULL, in newPoint : bool = true)cac_g(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)

Nu : intnumResponseFunctions : IndexyL : VectoryU : VectoruL[1..Nu] : VectoruU[1..Nu] : VectorgL : VectorgU : Vectory0 : Vectoru0[1...Nu] : Vector

Nonlin::NonlinearProblem

TSFCore::VectorSpace

space_y1space_u1..*

space_c 1space_g 1

adjointSupported() : boolset_DcDy(in DcDy : LinearOpWithSolve)set_DcDu(in l : int, in DcDu : LinearOp)set_DgDy(in DgDy : MultiVector)set_DgDu(in l : int, in DgDu : MultiVector)calc_DcDy(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)calc_DcDu(in l : int, in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)cac_DgDy(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)calc_DgDu(in l : int, in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)

Nonlin::NonlinearProblemFirstOrder

AbstractFactory<LinearOpNonsing>AbstractFactory<LinearOp>

factory_DcDy1factory_DcDu

1..*

TSFCore::LinearOp

nonsingStatus() : ENonsingStatussolve(in M_trans, in x : Vector, out y : Vector, in ...)solve(in M_trans, in X : MultiVector, out Y : MultiVector, in ...)

Nonlin::LinearSolveOp

Nonlin::LinearOpWithSolve

preconditioner

0..1

«create»

«create»

Solvers::ConvergenceTester

TSFCore::OpBase

Supported Areas• SAND• Nonlinear equations• Multidisciplinary analysis• Stability analysis / continuation• SFE

Summary

SAND Reports

R. A. Bartlett, M. A. Heroux and K. R. Long. TSFCore : A Package of Light-Weight Object-Oriented Abstractions for the Development of Abstract Numerical Algorithms and Interfacing to Linear Algebra Libraries and Applications, Sandia National Laboratories, SAND2003-1378, 2003

R. A. Bartlett, TSFCore::Nonlin : An Extension of TSFCore for the Development of Nonlinear Abstract Numerical Algorithms and Interfacing to Nonlinear Applications, Sandia National Laboratories, SAND2003-1377, 2003

Location: Trilinos/doc/TSFCore

The End

Thank You!

Extra Slides

Examples of Non-Standard Vector Operations

Examples from OOQP (Gertz, Wright)y y x i ni i i / , ...1y y x z i ni i i i , ...1

yy y y yy y y y

y y yi ni

i i

i i

i

min min

max max

min max, ...

ififif0

1 dx:max

Example from TRICE (Dennis, Heinkenschloss, Vicente)

d

b u w bw b

u a w aw a

i ni

i i i

i i

i i i

i i

( ) and and

( ) and . and .

, ...

/

/

1 2

1 2

01 0

01 0

1

ifififif

Example from IPOPT (Waechter)

Ui

Li

Ui

Lii

U

Li

Li

Ui

Lii

L

iU

iiU

iL

iiL

iU

iL

Li

UiL

i

i

xxxxxxxxxx

where

nixxxxxx

xxxxx

x

,maxˆ,minˆ

:

...1,îfˆîfˆ

ˆîf2 Currently in MOOCHO :

> 40 vector operations!

Goals for a Vector Interface

Compute efficiency => Near optimal performance

Optimization developers add new operations => Independence of linear algebra . library developers

Compute environment independence => Flexible optimization software

Minimal number of methods => Easy to write adapters

Approaches to Developing Vector Interfaces

(1) Linear algebra library allows direct access to vector elements

(2) Optimizer-specific interfaces

(3) General-purpose primitive vector operations

Vector Reduction/Transformation Operators Defined

Reduction/Transformation Operators (RTOp) Defined

z 1i … z q

i opt( i , v 1i … v p

i , z 1i … z q

i ) element-wise transformation opr( i , v 1

i … v pi , z 1

i … z qi ) element-wise reduction

2 oprr( 1 , 2 ) reduction of intermediate reduction objects

• v 1 … v p R n : p non-mutable input vectors

• z 1 … z q

R n : q mutable input/output vectors• : reduction target object (many be non-scalar (e.g. {yk ,k}), or NULL)

Key to Optimal Performance

• opt(…) and opr(…) applied to entire sets of subvectors (i = a…b) independently:z 1

a:b … z qa:b , op( a, b , v 1

a:b … v pa:b , z 1

a:b … z qa:b , )

• Communication between sets of subvectors only for NULL, oprr( 1 , 2 ) 2

Object-Oriented Design for User Defined RTOp Operators

Advantages:• Functionality

• Linear-algebra implementations can be changed with no impact on optimizer• Optimizer developers can unilaterally add new vector operations

• Performance• Near optimal performance (large subvectors)• Multiple simultaneous global reductions => no sequential bottlenecks• No unnecessary temporary vectors or multiple vector read/writes

• Disadvantages:• New concepts, initially harder to understand interfaces?

V e c t o r

apply_op( op:RTOp, ... , reduct_obj:ReductionTarget, ... )

R T O pR e d u c t i o n T a r g e t

A s s i g n S c a l a r O p

set_alpha(alpha)apply_op(...)

D o t P r o d u c t O p

apply_op(...)reduce_reduct_objs(...)

O p t i m i z e r

M a x S t e p O p

apply_op( ..., reduct_obj:ReductionTarget )reduce_reduct_objs( ... )

O u tO f C o re V e c to r

apply_op(...)apply_op(...)apply_op(...)

M P I V e c t o rS e r ia lV e c to rset_beta(beta)apply_op(...)reduce_reduct_objs(...)

......

Applys

Abstracts / encapsulates vectors Implements vector operations

RTOp vs. Primitives : Communication

• Compare– RTOp (all-at-once reduction (i.e. ISIS++ QMR solver))

{, , , , } { (xT x)1/2, (vT v)1/2, (wT w)1/2, wT v, vT t }– Primitives (5 separate reductions)

(xT x)1/2, (vT v)1/2, (wT w)1/2, wT v, vT

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400

num_axpys

(

all-

at-o

nce

cpu

)ra

tio -

------

------

------

------

-

( 5

pim

itive

s cp

u )

local dim

50,000

5,000

50050

* 128 processors on CPlant®

RTOp vs. Primitives : Multiple Ops and Temporaries

• Compare– RTOp (all-at-once reduction){ max : x + d } = min{ max( ( - xi)/di, 0 ), for i = 1 … n } – Primitives (5 temporaries, 6 vector operations)-xi ui, xi + vi, vi / di wi, 0 yi, max{wi,yi} zi, min{zi,i=1…n}

* 1 processor (gcc 3.1 under Linux)

Question: Does OO C++ allow for good scalability for massively parallel computing (i.e. 100 to 10000 processors)?

Parallel Scalability of MOOCHO

Scaleable exampleNLP (m = n/2) 2,...,10101)(..

21)(min

1

2

njxxxxcts

xxf

jmjmjj

n

ii

A C NT

Z C NI

1

Variable reductionrange / null spacedecomposition

• Diagonal matrices => All vector ops!

Where is the parallel bottleneck?

Is it OO C++ or MPI?

* Red Hat Linux cluster (4 nodes)• 2.0 GHz Intel P4 processors• MPICH 1.2.2.1

Answer => MPI

Serial overhead of MOOCHO (n=2, Np=1) 0.41 milliseconds per rSQP iteration

Overhead of MPI communication (Np=4) 0.42 milliseconds per global reduction

1

2

3

4

1 2 3 4

Np (number of processors)

Para

llel S

peed

up

n (global dim)

20,000

200,0002,000,000

an overview of tsfcore roscoe a. bartlett 9211, optimization and uncertainty estimation sandia is a...

Documents