a data analysis framework for the neutron community michael m. mckerns materials science and applied...

23
A Data Analysis Framework for the Neutron Community Michael M. McKerns Materials Science and Applied Physics Center for Advanced Computing Research California Institute of Technology (Distributed Data Analysis for Neutron Scattering Experiments):

Upload: lesley-robertson

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

A Data Analysis Framework for the Neutron Community

Michael M. McKerns

Materials Science and Applied PhysicsCenter for Advanced Computing Research

California Institute of Technology

(Distributed Data Analysis for Neutron Scattering Experiments):

Serving a Growing Community• With the availability of OPAL, SNS, and JPARK fast approaching, the

neutron community has the potential to undergo a large growth spurt.

• Software is a vital part of scattering research, and unless the software is both robust and easy to use, that growth may be limited.

• Mature packages do exist (McStas, ISAW, DAVE, …), and commercial packages are also used (Matlab, IDL, Abaqus, IGOR Pro, …) in the analysis process. However, groups often use cryptic legacy code for at least one step.

• To grow as a community, we need:– a way to cultivate and maintain the valuable portions of these legacy codes– to make legacy and community-standard codes interoperable

• define common data structures and interfaces• stop duplication of effort

– to allow scientists to concentrate more on science by lowering the barriers to software engineering

There is much to do…• Software is needed to support the massive quantity of data that will be

produced at modern neutron facilities.

• Existing software may be incapable of utilizing the full richness of the data that will be produced.

• Although the barrier to developing new software must be reduced, it is also critical that more complex software technologies (i.e. high-performance and grid-based computing) are enabled.

• Time is short – we must use the best existing tools to provide a robust solution… yet be flexible enough to allow for the easy substitution of better future solutions.

Software User Stereotypes I• Instrument Scientist

– author of prepackaged and specialized tools

– wants:• portable building and debugging tools

• large toolkit of robust modules and support code

• rapid application development

• GUI builder to compose interactive widgets, forms, and wizards

• to focus on supporting the instrument, not writing software

• Visiting Scientist– user of prepackaged and specialized tools

– wants:• UI that is simple to understand & easy to use

• reasonable defaults for most choices

• well diagnosed and explained error messages

• intelligently concealed complexity

Software User Stereotypes II• Established Researcher

– coordinator/author/reviewer, designer of new applications

– wants:• flexible UI that enables interactive exploration

• access to a comprehensive set of data transformations

• access to modeling and simulation packages

• tools to compare outputs of different analyses

• casually useable high-end graphics

• Beginning Student– user of tools and documentation as learning environment

– wants:• well documented interface and modules

• access to a set of standard applications

• flexible UI that enables interactive exploration

Software User Stereotypes III• Analysis Expert

– author of analysis, modeling, or simulation software

– wants:• portable building and debugging tools

• large toolkit of robust modules and support code

• easy access to sample data

• to solve physics problems, not software engineering problems

• Software Engineer– binds software to common environment, extends software to the framework

– wants:• portable building and debugging tools

• large toolkit of robust modules and support code

• well documented access to the software and framework integration layer

• validation, verification, and regression testing

• Framework Maintainer– maintains and extends the software infrastructure

What is DANSE?• a 12M$ five-year NSF IMR-MIP software construction project

• a collaborative effort between software professionals, neutron scattering scientists, and facilities

• a software engineering effort– open-source development environment

– framework for the interoperability of modular components

– integration of legacy codes and community-standard software

– connectivity to facility databases and software repositories

• a scientific endeavor– to develop software modules for different subfields of neutron scattering

– to enhance neutron scattering research and facilitate new science

– to build tools for education, collaboration, and plausibility assessment

• an integration framework for building data analysis, visualization, modeling, and instrument simulation tools for all areas of neutron scattering

The Power of Python• The fundamental commodity for neutron scattering software is found within

the cores of time-tested community-standard software. Rather than rewrite or duplicate this software, we can use python to provide an integration path into a common language.

• Python is– a modern object-oriented language

– robust, portable, mature, well-supported, well-documented

– easily extendable

– supports rapid application development

• Python scripting enables us to– compose computations at runtime and discover capabilities without recompilation or

relinking

– organize large numbers of user-tunable parameters

• Binding Python to other languages (C++, Fortran, …) allows integration without measurable impact on performance or scalability

Building a Scientific Toolkit• Through Python, DANSE will have access to many tools

– basic data structures, optimization algorithms, numerical libraries

– basic data reduction library [obtain I(Q), S(Q), S(E), S(Q,E)]

– graphical/plotting environments• IDL, Matlab, Matplotlib, Gnuplot, Grace, ParaView, ACIS (AutoCAD), …

– instrument simulation• McStas, VITESS, sample simulation framework, …

– materials simulation• ABINIT, VASP, GAMESS, NWChen, NAMD, CHARMM, …

– crystallography• cctbx, FOX, ObjCryst++, …

– molecular viewers and format translators• OpenBabel, Molden, PyMol, ViewMol, DRAWxtl, VMD, AtomEye, …

– and MORE!• ISAW, texture analysis (MAUD), SLD calculator, scattering intensity, …

The Power of a Framework• While a single application can be built relatively quickly without using a

framework, much effort will be spent on error handling, logging, UI construction, and other services.

• A software framework provides– a specification for organization of the software

– a description of the crucial structural elements and their interfaces

– a specification of the possible collaborations of these elements

– a strategy for the composition of new elements

– flexibility and robustness under evolutionary pressures

– services• life cycle management, logging and monitoring

• network client and server support, authentication

• should not be rewritten for every application, but simply reused

• A framework increases reusability & decreases the development time

DANSE uses Pyre Framework• Pyre software architecture

– robust, stable, open-source foundation

– >75,000 lines of Python; 30,000 lines of C++

• component-based runtime environment– components are pre-compiled and connected by

the user at runtime

– user directs component interconnections using visual, script-based, or shell programming

• a set of co-operating abstract services– framework provides structural girdle

– executive layer manages application life cycle

– applications built from modular components

– components tie software cores to data streams

– UI independent of underlying framework

application-general

application-specific

framework

computational engines

ComponentComponent

CORECORE

Modularity of Components• granularity allows reusability of object-oriented components

• rebinning application

• modularity provides flexibility and extensibility

NeXusReaderNeXusReader SelectorSelector

BckgrndBckgrnd

SelectorSelector

SelectorSelector

EnergyEnergy NeXusWriterNeXusWritertimestimes

instrument infoinstrument info

raw countsraw counts

filenamefilename

time intervaltime interval

energy binsenergy bins filenamefilename

Component Data Flow Paradigm• scientific analysis codes constitute the cores of software components

• components mediate interaction between cores and environment– inherit methods (such as message passing and error handling) from environment

– responsible for initialization of programs within their component core

– access centralized mechanism for logging status, errors, and history

– negotiate data exchanges with XML-based data exchange protocols

• components utilize data streams to pass information between ports– interact with executive layer to negotiate execution flow

– facilitate physical decoupling of computation among distributed resources

ComponentComponent

CORECORE

ComponentComponent

CORECORE

Component Implementation• build core engine (Python, Fortran, C++, Java, Matlab, IDL, …)

– legacy or custom code and third-party libraries

– provide life-cycle management and exception handling strategy

• construct Python bindings– select entry points to expose to Python

– modularize entry points to monolithic compiled libraries

• cast as a component– extend and leverage framework services

– describe user-configurable parameters

– provide meta-data that specify the IO port characteristics

• test code– satisfy functional requirements with concurrent test development

– utilize interactive runtime testing within Python interpreter

– demonstrate integration with other components

ComponentComponent

CORECORE

Building Abstract Applications• DANSE uses a design pattern that enables the assembly of components at

runtime under user control

• Facilities are named abstract application requirements

• Components are concrete named engines that satisfy the requirements

• Power of an API– the application author provides:

• a specification of the application facilities as part of the application definition

• a component to be used as the default

– the application user can construct scripts that create alternative components that comply with the facility interface

– the end user can:• configure the properties of the component

• select which component is to be bound to a given facility

• Abstraction is required for dynamic and distributed applications

Visual Programming Interface• Workflow graphs are a naturally dynamic interface due to the

correspondence between logical and physical descriptions of the computation.

• There are multiple views of each computation– data flow

– control flow

– deployment of distributed components

• Should allow interactive editing of component state– access to modify component properties

– dynamic interface generation from component-supplied specifications

NeXusReaderNeXusReader SelectorSelector

BckgrndBckgrnd

SelectorSelector

SelectorSelector

EnergyEnergy NeXusWriterNeXusWritertimestimes

instrument infoinstrument info

raw countsraw counts

filenamefilename

time intervaltime interval

energy binsenergy bins filenamefilename

Distributed/Parallel Computing• Enabled by design

– component framework utilizing data streams

– requirements for building distributed and parallel computations nearly the same as those for building applications in a visual programming interface

• Pyre originally designed to compose and control parallel applications– bindings to mpi

– encapsulation of python interpreter in mpi

• Enable distributed computing with currently available technologies– initial authentication and deployment based on ssh & scp

– authentication and security using pyre services

– access constrained to user space

• Take advantage of Grid services as they become available…

Broad Scientific Scope• data reduction and experiment simulation

– diffraction, engineering diffraction, and inelastic scattering data reduction

– SANS/USANS and neutron reflectometry data reduction

– instrument and microstructure simulation

• modeling– full profile modeling in real and reciprocal space (GSAS, FullProf, PDFFIT)

– finite element modeling (ABAQUS); self-consistent modeling

– constrained fitting by use of data from other experimental techniques

– 1D/2D model fitting; model independent peak fitting

– direct modeling of physical systems; ab-initio modeling

– scattering kernel; multiple scattering

– neutron weight correction; separation of nuclear and spin scattering

– micromagnetic simulations (OOMMF); disordered spin dynamics

– chemical spectroscopy dynamics (CLIMAX)

Facilitates New & Better Science• better data analysis

– FEM calculations of strains in microstructures

– Monte-Carlo inversions of S(Q,E) to obtain parameters of structure and dynamics models

– model refinements with multiple data sets

• integration of theory– micromechanics using correlations of local strains

– phase diagrams from thermodynamic functions

– ab-initio calculations of spin interactions

– soft matter structure using atomic force fields guided by diffraction

• experiment planning and execution– single crystals on chopper spectrometers

– feedback control and real-time assessment

– plausibility testing and contingency planning

– assessment of science/data trends from previous data

Goals & Objectives

• The goal of DANSE is to provide a community supported open-source software environment for scattering research that:– integrates the basic data reduction, analysis, modeling, and simulation capabilities that

are available today

– provides powerful new applications for data reduction, analysis, modeling, and simulation

– enables new types of science in all major subfields of neutron scattering research

– provides a coherent framework onto which software components can easily be added by scientists

– lowers the barrier to software development

– minimizes duplication of effort in the scattering software community

– decreases the time and effort in creating new software applications

– provides a certification and quality assurance process to aid with facility integration

DANSE Project Information• milestones for the DANSE software

– project start 2006

– beta release 2008

– release 1.0 2009

– transition to community/SNS 2010

• documentation, tutorials, and further information– the DANSE wiki at http://wiki.cacr.caltech.edu/danse

– the Pyre homepage at http://www.cacr.caltech.edu/projects/pyre

• contacts– Brent Fultz [email protected]; Michael Aivazis, Ian Anderson

– Simon Billinge, Ersan Üstündag, Paul Butler, Paul Kienzle, Tom Swain

– Michael McKerns [email protected]

End Presentation