an overview of tsfcore roscoe a. bartlett 9211, optimization and uncertainty estimation sandia is a...
DESCRIPTION
Nonlinear Equations : Foundation for all our Work! Applications Discretized PDEs (e.g. finite element, finite volume, finite difference etc.) Network problems (e.g. Xyce)TRANSCRIPT
An Overview of TSFCore
Roscoe A. Bartlett
9211, Optimization and Uncertainty Estimation
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy under contract DE-AC04-94AL85000.
TSFCore SAND Reports
Get most recent copy at: Trilinos/doc/TSFCore
Nonlinear Equations : Foundation for all our Work!
Applications• Discretized PDEs (e.g. finite element, finite volume, finite difference etc.)• Network problems (e.g. Xyce)
Nonlinear Equations : Sensitivities
Related Algorithms• Gradient-based optimization
• SAND• NAND
• Nonlinear equations (NLS)• Multidisciplinary analysis
• Linear (matrix) analysis• Block iterative solvers• Eigenvalue problems
• Uncertainty quantification• SFE
• Stability analysis / continuation• Transients (ODEs, DAEs)
B. van Bloemen Waanders, R. A. Bartlett, K. R. Long and P. T. Boggs. Large Scale Non-Linear Programming: PDE Applications and Dynamical Systems, Sandia National Laboratories, SAND2002-3198, 2002
Applications, Algorithms, Linear-Algebra Software
APP : Application (e.g. MPSalsa, Xyce, SIERRA, NEVADA etc.)LAL : Linear-Algebra Library (e.g. Petra/Ifpack, PETSc, Aztec etc.)ANA : Abstract Numerical Algorithm (e.g. optimization, nonlinear solvers, stability analysis,
SFE, transient solvers etc.)
A P P In t e r f a c e
A P P
V e cM a t
A N A
P r e c o n d i t io n e r
LAL
111
1..*
1
1..*1..*1
1
1
Computes functions
Key points• Complex algorithms• Complex software• Complex interfaces• Complex computers• Duplication of effort?
Examples: Epetra_RowMatrix fei::Matrix TSF::MatrixOperator
TSFCore
A P P In t e r f a c e
A P P
V e cM a t
A N A
P r e c o n d i t io n e r
LAL
111
1..*1..*
1
Computes functions
Key points• Maximizing development impact• Software can be run on more sophisticated computers• Fosters improved algorithm development
TSFCore
TSFCore::Nonlin
Requirements for TSFCore
TSFCore should:
Be portable to ASCI platforms
Provide for stable and accurate numerical computations
Represent a minimal but complete interface that will result in implementations that are:
Near optimal in computational speed
Near optimal in storage
Be independent of computing environment (SPMD, MS, CS etc.)
Be easy to develop adapters for existing libraries (e.g. Epetra, PETSc etc.)
Example ANA : Linear Conjugate Gradient Solver
TSFCore : Basic Linear Algebra Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomainspace
An operator knows its domain and range spaces
A linear operator is a kind of operator
Warning! Unified Modeling Langage (UML) Notation!
A Vector knows its VectorSpace
<<create>>
VectorSpaces create Vectors!
TSFCore : Basic Linear Algebra Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomainspace
<<create>>
TSFCore : Basic Linear Algebra Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomainspace
A MulitVector is a linear operator!
A MulitVector has a collection of column vectors!
A MulitVector is a tall thin dense matrix
<<create>>
VectorSpaces create MultiVectors!
TSFCore : Basic Linear Algebra Interfaces
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomainspace
<<create>>
TSFCore : Basic Linear Algebra Interfaces
The Key to success!Reduction/Transformation Operators • Supports all needed vector operations• Data/parallel independence• Optimal performance
LinearOpVectorSpace
OpBase
Vector
MultiVector
1
columns1..*
RTOpT
rangedomainspace
R. A. Bartlett, B. G. van Bloemen Waanders and M. A. Heroux. Vector Reduction/Transformation Operators, Accepted to ACM TOMS, 2003
Background for TSFCore
1996 : Hilbert Class Library (HCL), [Symes and Gockenbach]
Abstract vector spaces, vectors, linear operators
2000 : Epetra, [Heroux]
Concrete multi-vectors
2001 : Trilinos Solver Framework (TSF) 0.1, [Long]
2001 : AbstractLinAlgPack (ALAP) (MOOCHO LA interfaces), [Bartlett]
Reduction/transformation operators (RTOp)
Abstract multi-vectors
TSFCore : Basic Linear Algebra Interfaces
createMember() : VectorcreateMembers(in numMembers : int) : MultiVectorisCompatible(in vecSpc : VectorSpace) : boolscalarProd(in x : Vector, in y : Vector) : Scalar
dim : int
TSFCore::VectorSpace
applyOp(in op : RTOpT, inout ...)
TSFCore::Vector
apply_op(inout ...)reduce_reduct_objs(inout ...)
RTOpPack::RTOpT
applyOp(in op : RTOpT, inout ...)subView(in col_rng : Range1D) : MultiVectorsubView(in numCols : int, in cols[1..numCols] : int) : MultiVector
TSFCore::MultiVector
opSupported(in M_trans) : bool
TSFCore::OpBase
1
columns1..*
«create»
«create»
domainspace
apply(in M_trans, in x : Vector, out y : Vector, in ...)apply(in M_trans, in X : MultiVector, out Y : MultiVector, in ...)
TSFCore::LinearOp
createVecSpc(in dim : int) : VectorSpace
TSFCore::VectorSpaceFactory
«create»
smallVecSpcFcty
range
VectorSpaceFactory is related to MultiVectors
VectorSpaces create Vectors and MultiVectors!
MultiVector subviews can be created!
Vector and MultiVector versions of apply(…)!
Adjoints supported but are optional!
Only one vector method!
TSFCore Details
All interfaces are templated on Scalar type (support real and complex)
Smart reference counted pointer class Teuchos::RefCountPtr<> used for all dynamic memory management
Many operations have default implementations based on very few pure virtual methods
RTOp operators (and wrapper functions) are provided for many common level-1 vector and multi-vector operations
Default implementation provided for MultiVector (MultiVectorCols)
Default implementations provided for serial computation: VectorSpace (SerialVectorSpace), VectorSpaceFactory (SerialVectorSpaceFactory), Vector (SerialVector)
Vector-Vector Operations Provided with TSFCore
namespace TSFCore {
template<class Scalar> Scalar sum( const Vector<Scalar>& v ); // result = sum(v(i))
template<class Scalar> Scalar norm_1( const Vector<Scalar>& v ); // result = ||v||1
template<class Scalar> Scalar norm_2( const Vector<Scalar>& v ); // result = ||v||2
template<class Scalar> Scalar norm_inf( const Vector<Scalar>& v_rhs ); // result = ||v||inf
template<class Scalar> Scalar dot( const Vector<Scalar>& x
,const Vector<Scalar>& y ); // result = x'*y
template<class Scalar> Scalar get_ele( const Vector<Scalar>& v, Index i ); // result = v(i)
template<class Scalar> void set_ele( Index i, Scalar alpha
,Vector<Scalar>* v ); // v(i) = alpha
template<class Scalar> void assign( Vector<Scalar>* y, const Scalar& alpha ); // y = alpha
template<class Scalar> void assign( Vector<Scalar>* y
,const Vector<Scalar>& x ); // y = x
template<class Scalar> void Vp_S( Vector<Scalar>* y, const Scalar& alpha ); // y += alpha
template<class Scalar> void Vt_S( Vector<Scalar>* y, const Scalar& alpha ); // y *= alpha
template<class Scalar> void Vp_StV( Vector<Scalar>* y, const Scalar& alpha
,const Vector<Scalar>& x ); // y = alpha*x + y
template<class Scalar> void ele_wise_prod( const Scalar& alpha
,const Vector<Scalar>& x, const Vector<Scalar>& v, Vector<Scalar>* y ); // y(i)+=alpha*x(i)*v(i)
template<class Scalar> void ele_wise_divide( const Scalar& alpha
,const Vector<Scalar>& x, const Vector<Scalar>& v, Vector<Scalar>* y ); // y(i)=alpha*x(i)/v(i)
template<class Scalar> void seed_randomize( unsigned int ); // Seed for randomize()
template<class Scalar> void randomize( Scalar l, Scalar u, Vector<Scalar>* v ); // v(i) = random(l,u)
} // end namespace TSFCore
TSFCore : Vectors and Vector Spaces
C++ code:
template<class Scalar>Scalar foo( const VectorSpace<Scalar>& S ){ Teuchos::RefCountPtr<Vector<Scalar> > x = S.createMember(), // create x y = S.createMember(); // create y assign( &*x, 1.0 ); // x = 1 randomize( -1.0, +1.0, &*y ); // y = rand(-1,1) Vp_StV( &*y, -2.0, *x ); // y += -2.0 * x Scalar gamma = dot(*x,*y); // gamma = x’*y return gamma;}
Mathematical notation:
TSFCore : Applying a Linear Operator
C++ Prototype:
namespace TSFCore { enum ETransp { NOTRANS, TRANS, CONJTRANS }; template<class Scalar> class LinearOp : public virtual OpBase<Scalar> { public: virtual void apply( ETransp M_trans, const Vector<Scalar> &x, Vector<Scalar> *y ,Scalar alpha = 1.0, Scalar beta = 0.0 ) const = 0; };}
Example:
template<class Scalar>void myOp( const Vector<Scalar> &x, const LinearOp<Scalar> &M ,Vector<Scalar> *y ){ M.apply( NOTRANS, x, y );}
Example ANA : Linear Conjugate Gradient Solver
Multi-vector Conjugate-Gradient Solver : Single Iteration
template<class Scalar> void CGSolver<Scalar>::doIteration( const LinearOp<Scalar> &M, ETransp opM_notrans ,ETransp opM_trans, MultiVector<Scalar> *X, Scalar a ,const LinearOp<Scalar> *M_tilde_inv ,ETransp opM_tilde_inv_notrans, ETransp opM_tilde_inv_trans ) const {
const Index m = currNumSystems_;
int j;
if( M_tilde_inv )
M_tilde_inv->apply( opM_tilde_inv_notrans, *R_, &*Z_ );
else
assign( &*Z_, *R_ );
dot( *Z_, *R_, &rho_[0] );
if( currIteration_ == 1 ) {
assign( &*P_, *Z_ );
}
else {
for(j=0;j<m;++j) beta_[j] = rho_[j]/rho_old_[j];
update( *Z_, &beta_[0], 1.0, &*P_ );
}
M.apply( opM_notrans, *P_, &*Q_ );
dot( *P_, *Q_, &gamma_[0] );
for(j=0;j<m;++j) alpha_[j] = rho_[j]/gamma_[j];
update( &alpha_[0], +1.0, *P_, X )
update( &alpha_[0], -1.0, *Q_, &*R_ ); }
The TSFCore Trilinos package
packages/TSFCore src
interfaces Core : VectorSpace, Vector, LinearOp etc … Solvers : Iterative linear solver interfaces (unofficial!) Nonlin : Nonlinear problem interfaces (unofficial!)
utilities Core : Testing etc … Solvers : Some iterative solvers (CG, BiCG, GMRES) Nonlin : Testing etc …
adapters mpi-base : Node classes for MPI-based vector spaces Epetra : EpetraVectorSpace, EpetraVector etc …
examples …
TSFCore::Nonlin : Interfaces to Nonlinear Problems
Supported Areas• NAND optimization• SAND optimization• Nonlinear equations• Multidisciplinary analysis• Stability analysis / continuation• SFE
Function evaluations:
NonlinearProblem
(nonsingular) State Jacobian evaluations :
Auxiliary Jacobian evaluations :
NonlinearProblemFirstOrder
State constraints andresponse functions
TSFCore::Nonlin : Interfaces to Nonlinear Problems
initialize()isInitialized() : boolset_c(in c : Vector)set_g(in g : Vector)unsetQuantities()calc_c(in y : Vector, in u[1...Nu] : Vector = NULL, in newPoint : bool = true)cac_g(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)
Nu : intnumResponseFunctions : IndexyL : VectoryU : VectoruL[1..Nu] : VectoruU[1..Nu] : VectorgL : VectorgU : Vectory0 : Vectoru0[1...Nu] : Vector
Nonlin::NonlinearProblem
TSFCore::VectorSpace
space_y1space_u1..*
space_c 1space_g 1
adjointSupported() : boolset_DcDy(in DcDy : LinearOpWithSolve)set_DcDu(in l : int, in DcDu : LinearOp)set_DgDy(in DgDy : MultiVector)set_DgDu(in l : int, in DgDu : MultiVector)calc_DcDy(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)calc_DcDu(in l : int, in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)cac_DgDy(in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)calc_DgDu(in l : int, in y : Vector, in u[1..Nu] : Vector = NULL, in newPoint : bool = true)
Nonlin::NonlinearProblemFirstOrder
AbstractFactory<LinearOpNonsing>AbstractFactory<LinearOp>
factory_DcDy1factory_DcDu
1..*
TSFCore::LinearOp
nonsingStatus() : ENonsingStatussolve(in M_trans, in x : Vector, out y : Vector, in ...)solve(in M_trans, in X : MultiVector, out Y : MultiVector, in ...)
Nonlin::LinearSolveOp
Nonlin::LinearOpWithSolve
preconditioner
0..1
«create»
«create»
Solvers::ConvergenceTester
TSFCore::OpBase
Supported Areas• SAND• Nonlinear equations• Multidisciplinary analysis• Stability analysis / continuation• SFE
Summary
SAND Reports
R. A. Bartlett, M. A. Heroux and K. R. Long. TSFCore : A Package of Light-Weight Object-Oriented Abstractions for the Development of Abstract Numerical Algorithms and Interfacing to Linear Algebra Libraries and Applications, Sandia National Laboratories, SAND2003-1378, 2003
R. A. Bartlett, TSFCore::Nonlin : An Extension of TSFCore for the Development of Nonlinear Abstract Numerical Algorithms and Interfacing to Nonlinear Applications, Sandia National Laboratories, SAND2003-1377, 2003
Location: Trilinos/doc/TSFCore
The End
Thank You!
Extra Slides
Examples of Non-Standard Vector Operations
Examples from OOQP (Gertz, Wright)y y x i ni i i / , ...1y y x z i ni i i i , ...1
yy y y yy y y y
y y yi ni
i i
i i
i
min min
max max
min max, ...
ififif0
1 dx:max
Example from TRICE (Dennis, Heinkenschloss, Vicente)
d
b u w bw b
u a w aw a
i ni
i i i
i i
i i i
i i
( ) and and
( ) and . and .
, ...
/
/
1 2
1 2
01 0
01 0
1
ifififif
Example from IPOPT (Waechter)
Ui
Li
Ui
Lii
U
Li
Li
Ui
Lii
L
iU
iiU
iL
iiL
iU
iL
Li
UiL
i
i
xxxxxxxxxx
where
nixxxxxx
xxxxx
x
,maxˆ,minˆ
:
...1,ˆifˆˆifˆ
ˆˆif2 Currently in MOOCHO :
> 40 vector operations!
Goals for a Vector Interface
Compute efficiency => Near optimal performance
Optimization developers add new operations => Independence of linear algebra . library developers
Compute environment independence => Flexible optimization software
Minimal number of methods => Easy to write adapters
Approaches to Developing Vector Interfaces
(1) Linear algebra library allows direct access to vector elements
(2) Optimizer-specific interfaces
(3) General-purpose primitive vector operations
Vector Reduction/Transformation Operators Defined
Reduction/Transformation Operators (RTOp) Defined
z 1i … z q
i opt( i , v 1i … v p
i , z 1i … z q
i ) element-wise transformation opr( i , v 1
i … v pi , z 1
i … z qi ) element-wise reduction
2 oprr( 1 , 2 ) reduction of intermediate reduction objects
• v 1 … v p R n : p non-mutable input vectors
• z 1 … z q
R n : q mutable input/output vectors• : reduction target object (many be non-scalar (e.g. {yk ,k}), or NULL)
Key to Optimal Performance
• opt(…) and opr(…) applied to entire sets of subvectors (i = a…b) independently:z 1
a:b … z qa:b , op( a, b , v 1
a:b … v pa:b , z 1
a:b … z qa:b , )
• Communication between sets of subvectors only for NULL, oprr( 1 , 2 ) 2
Object-Oriented Design for User Defined RTOp Operators
Advantages:• Functionality
• Linear-algebra implementations can be changed with no impact on optimizer• Optimizer developers can unilaterally add new vector operations
• Performance• Near optimal performance (large subvectors)• Multiple simultaneous global reductions => no sequential bottlenecks• No unnecessary temporary vectors or multiple vector read/writes
• Disadvantages:• New concepts, initially harder to understand interfaces?
V e c t o r
apply_op( op:RTOp, ... , reduct_obj:ReductionTarget, ... )
R T O pR e d u c t i o n T a r g e t
A s s i g n S c a l a r O p
set_alpha(alpha)apply_op(...)
D o t P r o d u c t O p
apply_op(...)reduce_reduct_objs(...)
O p t i m i z e r
M a x S t e p O p
apply_op( ..., reduct_obj:ReductionTarget )reduce_reduct_objs( ... )
O u tO f C o re V e c to r
apply_op(...)apply_op(...)apply_op(...)
M P I V e c t o rS e r ia lV e c to rset_beta(beta)apply_op(...)reduce_reduct_objs(...)
......
Applys
Abstracts / encapsulates vectors Implements vector operations
RTOp vs. Primitives : Communication
• Compare– RTOp (all-at-once reduction (i.e. ISIS++ QMR solver))
{, , , , } { (xT x)1/2, (vT v)1/2, (wT w)1/2, wT v, vT t }– Primitives (5 separate reductions)
(xT x)1/2, (vT v)1/2, (wT w)1/2, wT v, vT
0
0.2
0.4
0.6
0.8
1
0 100 200 300 400
num_axpys
(
all-
at-o
nce
cpu
)ra
tio -
------
------
------
------
-
( 5
pim
itive
s cp
u )
local dim
50,000
5,000
50050
* 128 processors on CPlant®
RTOp vs. Primitives : Multiple Ops and Temporaries
• Compare– RTOp (all-at-once reduction){ max : x + d } = min{ max( ( - xi)/di, 0 ), for i = 1 … n } – Primitives (5 temporaries, 6 vector operations)-xi ui, xi + vi, vi / di wi, 0 yi, max{wi,yi} zi, min{zi,i=1…n}
* 1 processor (gcc 3.1 under Linux)
Question: Does OO C++ allow for good scalability for massively parallel computing (i.e. 100 to 10000 processors)?
Parallel Scalability of MOOCHO
Scaleable exampleNLP (m = n/2) 2,...,10101)(..
21)(min
1
2
njxxxxcts
xxf
jmjmjj
n
ii
A C NT
Z C NI
1
Variable reductionrange / null spacedecomposition
• Diagonal matrices => All vector ops!
Where is the parallel bottleneck?
Is it OO C++ or MPI?
* Red Hat Linux cluster (4 nodes)• 2.0 GHz Intel P4 processors• MPICH 1.2.2.1
Answer => MPI
Serial overhead of MOOCHO (n=2, Np=1) 0.41 milliseconds per rSQP iteration
Overhead of MPI communication (Np=4) 0.42 milliseconds per global reduction
1
2
3
4
1 2 3 4
Np (number of processors)
Para
llel S
peed
up
n (global dim)
20,000
200,0002,000,000