the component retrieval language

21
SIMPLIFYING COMPLEX SOFTWARE ASSEMBLY THE COMPONENT RETRIEVAL LANGUAGE AND IMPLEMENTATION Presenter: Eric Seidel Dept. of Computer Science City College of New York [email protected] Co-authors: Gabrielle Allen, Steven Brandt, Frank Löffler, and Erik Schnetter Center for Computation & Technology Louisiana State University Wednesday, August 4, 2010

Upload: eric-seidel

Post on 16-Jul-2015

238 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: The Component Retrieval Language

SIMPLIFYING COMPLEX SOFTWARE ASSEMBLY

THE COMPONENT RETRIEVAL LANGUAGE AND IMPLEMENTATION

Presenter :Eric SeidelDept. of Computer ScienceCity College of New [email protected]

Co-authors:Gabrielle Allen, Steven Brandt, Frank Löffler, and Erik SchnetterCenter for Computation & TechnologyLouisiana State University

Wednesday, August 4, 2010

Page 2: The Component Retrieval Language

COMPONENT FRAMEWORKS• Set of individual software modules coordinated by glue

framework

• Each component (module) performs a specific task and encapsulates a set of related functions data

• Frameworks can range from having a few components to many

• Components communicate via interfaces

• Used for various purposes, HPC examples include

• Cactus Framework

• CCA Frameworks (e.g. Caffeine)

•Domain specific frameworks (e.g. Earth System Modeling Framework)

Wednesday, August 4, 2010

Page 3: The Component Retrieval Language

CACTUS• Component Framework

• Over 500 unique components

• Distributed around the world

• Flesh

• Core application

• Thorns

• Independent modules

• Perform actual computation

• High Performance Computing

• Massively parallel

• Runs on high end supercomputer clusters

• Supports many applications

• Numerical Relativity

• Quantum Gravity

• Computational Fluid Dynamics

www.cactuscode.org

Wednesday, August 4, 2010

Page 4: The Component Retrieval Language

CACTUS WORKFLOW

• Managed using “Thornlists”

• Plaintext list of thorns required for a specific configuration

• Used to checkout, update, build, and test the source code

!REPOSITORY_TYPE pserver!REPOSITORY_LOCATION cvs.cactuscode.org!REPOSITORY_NAME /cactusdevcvs!REPOSITORY_USER eric9

CactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBaseCactusBase/IOASCIICactusBase/IOBasicCactusBase/IOUtilCactusBase/InitBaseCactusBase/LocalInterp

Wednesday, August 4, 2010

Page 5: The Component Retrieval Language

EINSTEIN TOOLKIT

• Toolkit for relativistic astrophysical simulations

• Developed using Cactus

• Comprised of 135 thorns

• Initial Data, Evolution/Analysis methods, Utilities

• First official release 2 months ago www.einsteintoolkit.org

Wednesday, August 4, 2010

Page 6: The Component Retrieval Language

MOTIVATION

•Distributed Software Frameworks are hard to assemble and manage

• Einstein Toolkit comprised of 135 individual components

• Very tedious to manually checkout or update

• Large barrier to entry for new users

Wednesday, August 4, 2010

Page 7: The Component Retrieval Language

VERSION CONTROL SYSTEMS• Used to track revisions in source code

• Concurrent Versions System (cvs)

• Released in 1990

• Uses client-server model

• Server stores full history of repository

• Clients retrieve specific revision

• Subversion (svn)

• Released in 2000

• Successor to cvs

• Also uses client-server model

• Git

• Released in 2005

• Uses distributed model

• Everyone has copy of full historyhttp://en.wikipedia.org/wiki/

File:Revision_controlled_project_visualization-2010-24-02.svg

Wednesday, August 4, 2010

Page 8: The Component Retrieval Language

GETCACTUS

• Designed to checkout and update Cactus thorns and flesh

• Specific to Cactus Framework

• Originally designed for CVS

• SVN and git added later

• Still difficult to distribute the framework

• Users must edit the thornlist

!REPOSITORY_TYPE pserver!REPOSITORY_LOCATION cvs.cactuscode.org!REPOSITORY_NAME /cactusdevcvs!REPOSITORY_USER eric9

CactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBaseCactusBase/IOASCIICactusBase/IOBasicCactusBase/IOUtilCactusBase/InitBaseCactusBase/LocalInterp

Wednesday, August 4, 2010

Page 9: The Component Retrieval Language

# NAME is an alphanumeric or ’.’ character

DOCUMENT : DIRECTIVES ;

DIRECTIVE : DEFINE NAME ’=’ PATH EOL| CHECKOUT ’=’ COMPONENTLIST EOL| CHECKOUT ’=’ EOL COMPONENTLIST EOL| REPO_LOC ’=’ LOC EOL| AUTH_LOC ’=’ LOC EOL| PATH_DIRECTIVE ’=’ PATH EOL

# !REPO_PATH, !CHECKOUT, !TARGET,# !ANON_PASS, !NAME

| NAME_DIRECTIVE ’=’ NAME EOL# !CRL_VERSION, !AUTH_USER,# !ANON_USER, !TYPE

;

DIRECTIVES : DIRECTIVE| DIRECTIVES DIRECTIVE;

LOC : PSERVER PATH # CVS repository| NAME ’:’ ’/’ ’/’ PATH # Git/SVN repository| NAME ’@’ NAME ’:’ PATH # Git repository;

PATH : NAME| ’/’ NAME| PATH ’/’ NAME;

COMPONENTLIST : PATH| COMPONENTLIST EOL PATH ;

Figure 2: Grammar for the CRL in Bison format

anonymous methods. The auto-update option will by-

pass the user prompt and update any components that

have been previously checked out, this allows GetCom-ponents to be safely called by another program as a

background process.

Authentication and updates are handled by the un-

derlying version control tools, with GetComponents pro-

viding a uniform layer between the user and the under-

lying tools. Figure 3 shows the general authentication

process used by GetComponents, which is called once

for each component block, unless anonymous mode has

been selected. It first checks for !AUTH_URL, which spec-

ifies authenticated access to the repository. It then at-

tempts to match the AUTH_URL to the GetComponentsusers file (located by default in $HOME/.crl/users). If

a match is found, GetComponents will use the associ-

ated username and then proceed to processing the next

component block. If no match is found, GetComponentswill prompt the user for their username, and attempt to

login to the repository using the appropriate command

(eg. cvs login), after which it will save the username

and URL in the users file. This has the security benefit

of keeping passwords visible only to the actual retrieval

tools. The user may also specify a ’-’ at this prompt

to indicate they wish to perform an anonymous check-

out for all components in the block. GetComponents will

store this as well in the users file, so the user is not

forced to specify anonymous access repeatedly. If the

user mistakenly entered the wrong username, or wishes

to change access methods, they may specify the -reset-authentication option, which will delete the users file

and allow the user to reenter their usernames.

If errors occur during the checkout process, GetCompo-nents stores the name of the component that caused the

error, and prints out a list of all components that had er-

rors before exiting. In addition any error will be logged,

including the exact command that was called, and the

error that was returned by the checkout tool. GetCompo-nents will also time the entire checkout/update process

and print the total time elapsed before exiting.

Multiple component lists may be specified together, in

which case GetComponents will concatenate the lists and

process them as one. The component list may also be

specified as an URL, which GetComponents will down-

load and then process normally. This further simplifies

the code assembly process, as the user must only down-

load GetComponents to initiate the assembly. In addi-

tion, the anonymous checkout process is shortened by

performing a shallow checkout of git repositories. As

a distributed versioning system, cloning a git reposi-

tory requires one to clone the entire repository, along

with the full history of the repository. Over time, this

history accumulates, and can consume a large amount

of disk space. A shallow checkout of a git repository

only clones the most recent changeset, thereby reducing

(sometimes greatly) the size of the resulting local copy,

for example the Carpet repository can be reduced from

115MB to 76MB by performing a shallow checkout.

GetComponents was written to be very modular, and it

can easily be extended to include other versioning tools.

All of the tools are handled by their own subroutine, and

are pointed to by a single hash, which GetComponentscompares with the !TYPE directive in each component.

To add new functionality, one would only have to write

a subroutine for the new tool, and add an entry to the

checkout_types hash.

7. EXAMPLE: EINSTEIN TOOLKIT

The Einstein Toolkit [7] is a collection of software com-

ponents and tools for simulating and analyzing general

relativistic astrophysical systems. Such systems include

gravitational wave space-times, collisions of compact ob-

jects such as black holes or neutron stars, accretion onto

compact objects, supernovae core collapse and gamma-

ray bursts. Different research teams typically use the

Einstein Toolkit as the basis of their group codes where

they supplement the toolkit with additional modules for

initial data, evolution, analysis etc.

The Einstein Toolkit uses a distributed development

model where its software modules are either developed,

distributed and supported by the core maintainers team,

or by individual groups. Where modules are provided

by external groups, the Einstein Toolkit maintainers

provide quality control for modules for inclusion in the

toolkit and coordinate support and releases. While the

core of the toolkit is a set of Cactus thorns (distributed

from different repositories), the toolkit also contains ex-

ample parameter files, documentation, and tools for vi-

sualization, debugging, and simulation deployment.

The component list (einsteintoolkit.th2) for the

Einstein Toolkit uses the CRL for distribution of its cur-

rently 130 different software components. All the com-

ponents of the Einstein Toolkit are available by anony-

2https://svn.einsteintoolkit.org/manifest/einsteintoolkit.th

COMPONENT RETRIEVAL LANGUAGE

•Designed to fix problems with original GetCactus script

• Provides unified, tool agnostic syntax

•Abstracts authentication procedures

•General-Purpose

•No longer specific to Cactus

Wednesday, August 4, 2010

Page 10: The Component Retrieval Language

SAMPLE CRL FILE

!DEFINE ROOT = Cactus!DEFINE ARR = $ROOT/arrangements

!TARGET = $ROOT!TYPE = svn!AUTH_URL = https://svn.cactuscode.org/flesh/trunk!URL = http://svn.cactuscode.org/flesh/trunk!CHECKOUT = Cactus!NAME = .

!TARGET = $ROOT!TYPE = svn!URL = https://svn.cct.lsu.edu/repos/numrel/$1/trunk!CHECKOUT = simfactory

!TARGET = $ARR!TYPE = svn!AUTH_URL = https://svn.cactuscode.org/arrangements/$1/$2/trunk!URL = http://svn.cactuscode.org/arrangements/$1/$2/trunk!CHECKOUT =CactusArchive/ADMCactusBase/BoundaryCactusBase/CartGrid3DCactusBase/CoordBase

!TARGET = $ARR!TYPE = git!URL = git://github.com/ianhinder/Kranc.git!AUTH_URL = [email protected]:ianhinder/Kranc.git!REPO_PATH= Auxiliary/Cactus!CHECKOUT =KrancNumericalTools/GenericFD

# McLachlan, the spacetime code!TARGET = $ARR!TYPE = git!URL = git://carpetcode.dyndns.org/McLachlan!AUTH_URL = [email protected]:McLachlan!REPO_PATH= $2!CHECKOUT = McLachlan/doc McLachlan/m McLachlan/parMcLachlan/ML_BSSNMcLachlan/ML_BSSN_HelperMcLachlan/ML_BSSN_O2McLachlan/ML_BSSN_O2_HelperMcLachlan/ML_BSSN_TestMcLachlan/ML_ADMConstraints

Wednesday, August 4, 2010

Page 11: The Component Retrieval Language

GETCOMPONENTS

•Designed to be very modular

• Currently supports 5 version control systems and http/ftp downloads

• Very easy to add more

• Can take input as local file or URL

•Manages all authentication issues

AssembleSimulation GetComponents

Einstein Toolkit

Cactus Fleshand CCTK

svn.cactuscode.org

Carpet AMRgit.carpetcode.org

Core Einstein Toolkitsvn.einsteintoolkit.org

Einstein Toolkitsvn.partnersite.org

Research Groups

Group Modulessvn.groupthorns.org

Individual Modulesgit.mythorns.org

Group Modulesftp.groupthorns.org

Tools, Parameter Files, & Data

svn.einsteintoolkit.org

./GetComponents http://tinyurl.com/einsteintoolkit-2010-06Wednesday, August 4, 2010

Page 12: The Component Retrieval Language

Anonymous mode

selected?

Are components available

anonymously?

Use anonymous

checkout

Print error and ignore component

Is username for URL known?

Prompt for username

Use known username

Verify access

Checkout components

yes

yesno

yes

no

no

AUTHENTICATION

• Authentication handled entirely by VCS tools

• GetComponents stores list of authenticated repositories and users

• Also tracks repositories with specified anonymous access

• Very secure

• GetComponents never sees any passwords!

Wednesday, August 4, 2010

Page 13: The Component Retrieval Language

CHECKOUT VS. UPDATE SPEED

0

325

650

975

1300

Abe Frost Kraken Lincoln LoneStar Longhorn Queen Bee Ranger Spur Steele

Tim

e (s

econ

ds)

TeraGrid Resource

Serial Checkout Parallel Checkout

Wednesday, August 4, 2010

Page 14: The Component Retrieval Language

GETCOMPONENTS

•Generating component lists is still time-consuming and tedious

• Barrier/impossible for new users

•Don’t need all Einstein Toolkit modules to run a simulation

•How to determine which components are needed for a particular simulation?

• e.g. what is needed to model two black holes, or a coastal surge?

Wednesday, August 4, 2010

Page 15: The Component Retrieval Language

Boundary

SymBase PUGH

WaveToyC

CartGrid3D

IDScalarWaveC

CoordBase

COMPONENT DEPENDENCIES

•Dependency tracking could allow custom built simulations

• Specify one component containing data about the simulation

• Initial values, type of simulation, etc

• Then recursively check component dependencies

Wednesday, August 4, 2010

Page 16: The Component Retrieval Language

WaveToyC

Boundary CartGrid3D

IDScalarWaveC

IsoSurfacer

WaveBinarySource

CoordBase

HTTPDEextra

HTTPD

IOAsciiIOBasic IOJpeg

IOUtil jpeg6b

LocalInterp LocalReduce

PUGHReduce

PUGH

PUGHSlab

SocketSymBase

Time

Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

COMPONENT DEPENDENCIES-- WAVETOY EXAMPLE

Wednesday, August 4, 2010

Page 17: The Component Retrieval Language

Nerve

AntichainEvol

BinaryCauset

CFlatSprinkleRandomAntichain

Distributions

IOUtil

MonteCarlo

PUGH

RNGs

Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

COMPONENT DEPENDENCIES-- QUANTUM GRAVITY

Wednesday, August 4, 2010

Page 18: The Component Retrieval Language

adm

admanalysis

admbase

admconstraints

ahfinder

ahfinderdirect

calck

distortedbhivp

ehfinder

exact

extract

grhydro

idanalyticbh

idaxibrillbh

idaxioddbrillbh

idbrilldata

idconstraintviolate

idfileadm

idlinearwaves

meudon_bin_bh

meudon_bin_nsmeudon_mag_ns

ml_admconstraints

ml_admquantities

ml_bssn

ml_bssn_helper

ml_bssn_o2

ml_bssn_o2_helper

noexcision

quasilocalmeasures

rotatingdbhivp

tmunubase

twopunctures

weylscal4

coordgauge

grhydro_initdata

staticconformal

tovsolver

admcoupling

admmacros

aeilocalinterp

lapack

blas

lorene

boundary

ellsor

cartoon2d

periodic

reflectionsymmetry

rotatingsymmetry180

rotatingsymmetry90

carpetinterp

carpet

carpetinterp2

carpetioascii

carpetiobasic

carpetiohdf5

carpetioscalar

carpetreduce

carpetregrid

carpetregrid2

carpetslab

cartgrid3d

carpetevolutionmask

carpetlib

carpetmask

nanchecker carpettracker

ioascii

iohdf5util

iojpeg

spacemask

dissipation

hydro_analysis

hydro_initexcision

legoexcision

multipole

noise

sphericalsurface

constants

coordbase

ellbase

eos_hybrid

eos_baseeos_polytrope

eos_idealfluid

eosg_hybrideosg_base

eosg_idealfluid

eosg_polytrope

formaline

fortran

newrad

genericfd

loopcontrolgsl

hdf5

iohdf5

httpdextra

httpd

hydrobase

setmask_sphericalsurface

initbase

iobasic ioutil

timerreport

terminationtrigger

jpeg6b

tgrtensor

localinterp

localreduce ml_bssn_test

mol

nice norms

pugh

pughinterp

pughreduce

pughslab

slab

slabtest

socket

summationbyparts

symbase

tatelliptic

time

Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

COMPONENT DEPENDENCIES-- EINSTEIN TOOLKIT

Wednesday, August 4, 2010

Page 19: The Component Retrieval Language

adm

admanalysis

admbase

admconstraints

ahfinder

ahfinderdirect

calck

distortedbhivp

ehfinder

exact

extract

grhydro

idanalyticbh

idaxibrillbh

idaxioddbrillbh

idbrilldata

idconstraintviolate

idfileadm

idlinearwaves

meudon_bin_bh

meudon_bin_nsmeudon_mag_ns

ml_admconstraints

ml_admquantities

ml_bssn

ml_bssn_helper

ml_bssn_o2

ml_bssn_o2_helper

noexcision

quasilocalmeasures

rotatingdbhivp

tmunubase

twopunctures

weylscal4

coordgauge

grhydro_initdata

staticconformal

tovsolver

admcoupling

admmacros

aeilocalinterp

lapack

blas

lorene

boundary

ellsor

cartoon2d

periodic

reflectionsymmetry

rotatingsymmetry180

rotatingsymmetry90

carpetinterp

carpet

carpetinterp2

carpetioascii

carpetiobasic

carpetiohdf5

carpetioscalar

carpetreduce

carpetregrid

carpetregrid2

carpetslab

cartgrid3d

carpetevolutionmask

carpetlib

carpetmask

nanchecker carpettracker

ioascii

iohdf5util

iojpeg

spacemask

dissipation

hydro_analysis

hydro_initexcision

legoexcision

multipole

noise

sphericalsurface

constants

coordbase

ellbase

eos_hybrid

eos_baseeos_polytrope

eos_idealfluid

eosg_hybrideosg_base

eosg_idealfluid

eosg_polytrope

formaline

fortran

newrad

genericfd

loopcontrolgsl

hdf5

iohdf5

httpdextra

httpd

hydrobase

setmask_sphericalsurface

initbase

iobasic ioutil

timerreport

terminationtrigger

jpeg6b

tgrtensor

localinterp

localreduce ml_bssn_test

mol

nice norms

pugh

pughinterp

pughreduce

pughslab

slab

slabtest

socket

summationbyparts

symbase

tatelliptic

time

Interface InheritanceFunction RequirementDirect Thorn DependencyShared Variable DependencyCapability Requirement

COMPONENT DEPENDENCIES-- EINSTEIN TOOLKIT

Wednesday, August 4, 2010

Page 20: The Component Retrieval Language

DISTRIBUTION

• GetComponents is freely available with an open-source license

• www.eseidel.org/download/GetComponents

• Full documentation available• ./GetComponents --man

Wednesday, August 4, 2010

Page 21: The Component Retrieval Language

ACKNOWLEDGEMENTS

•Many thanks to Gabrielle Allen, Steve Brandt, Frank Löffler, and Erik Schnetter

Wednesday, August 4, 2010