research design for collaborative computational approaches and scientific workflows deana pennington...

15
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Upload: erin-watson

Post on 03-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Research Design forCollaborative Computational Approaches

andScientific Workflows

Deana Pennington

January 8, 2007

Page 2: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Informatics and Informatics and the Research Cyclethe Research Cycle

MentalModel

ResearchDesign

Publish

Data-intensiveData mining

Bio-inspired algs.Exp. Data Analysis

Visualization

Compute-intensive

Parallel processingHigh throughput

Knowledge-intensive

Human cognitionOntologies

Sem. mediation

CollectData

Inductive, DescriptiveStatistics

Deductive, PrescriptiveMechanistic

ConductAnalyses

Scientific WorkflowSystem

•Automation => replication•Access to distributed resources•Reusability & sharing•Empowered by knowledge-intensive approaches***

DataManagementData models

MetadataStorage

Cyberinfrastructure: Sharing data, analyses, mental models

Page 3: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Scientific WorkflowsScientific Workflows

• Scientists do their analyses now by:Scientists do their analyses now by:– Focus on data collection and the analytical stepsFocus on data collection and the analytical steps– Manually coordinate export and import of data among Manually coordinate export and import of data among

software systemssoftware systems

• Workflow systems collaborate with the scientist to:Workflow systems collaborate with the scientist to:– Discover existing dataDiscover existing data– Handle data flow between componentsHandle data flow between components– Document the analytical processDocument the analytical process

Query EcoGrid to find data

Archive output to EcoGrid with workflow

metadata

Page 4: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

– Not linearNot linear– Involve multiple data setsInvolve multiple data sets– Involve multiple analytical stepsInvolve multiple analytical steps

Page 5: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Automated WorkflowsAutomated Workflows

• ScriptsScripts Single platformSingle platform• Visual modelingVisual modeling Single environment Single environment

environmentenvironment

• Workflows: Workflows: – Cross-platformCross-platform– Cross-environmentCross-environment– Distributed data & analysesDistributed data & analyses

Page 6: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Productivity Productivity ExampleExample

Mental ModelMental Model Biomass Temp Soil Et al.== f (

C Concept

Climate Temp

Soil

Biomass

Merge Model Predict

Conceptual WorkflowConceptual Workflow

AS AS ASAS

TS

TSTSDS

DSDS

DS

DS TS

TSTS

Transformation Step

DessiminationDS

Executable WorkflowExecutable Workflow

AS Analysis Step

Data StepDS

AS AS ASASDS

DS

DS

Abstract WorkflowAbstract Workflow

“View1”: Excel GIS SAS GIS“View2”: VBScript R Script GA R

Page 7: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Scientists design their Scientists design their research at the research at the conceptual workflow levelconceptual workflow level

•Often done on the fly over the period of time the research is being conducted

•For automated approaches, this must be well thought out from the beginning

•HOWEVER, because of the automation it is easy to modify the analysis and rerun it many times, so you are not locked into the original design

Page 8: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

BenefitsBenefits

•Reusable analysis steps, pipelines, and workflows•Formal documentation of methods

(output in report format)•Reproducibility of methods•Visual creation and communication of methods•Versioning•Automated data typing and transformation

Page 9: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Nested workflowsNested workflows

ASx TS1 ASy ASz ASrTS2

Search forrelevant

dataand

analyses(Query)

SW0

Image Processing

Pipeline

SignalProcessing

Pipeline

ASrTS2

FieldData

GroundSensors

Imagery

Semantically-integrated

Page 10: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Ecological niche modeling Ecological niche modeling conceptual workflowconceptual workflow

Training sample

GARPrule set

Test sample

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

Transformation

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Species pres. & abs.

points

Page 11: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Ecological niche modeling Ecological niche modeling conceptual workflowconceptual workflow

Training sample

GARPrule set

Test sample

Species pres. & abs.

points

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Native range prediction map

Env. layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

Transformation

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Species pres. & abs.

points

Spatial locationTemporal extent

Page 12: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Generic WorkflowGeneric Workflow

Training sample

GARP rule set

Test sample

OccurrenceData

Binary, Categorical or Numeric

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Prediction map

Environmental

layers

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Page 13: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Temperature Interpolation Temperature Interpolation WorkflowWorkflow

Training sample

GARPrule set

Test sample

Weather stationtemperature

data

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Prediction map:

Interpolated temperature

grid

Environmental

layers:elevation, aspect,

land cover

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Page 14: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Temperature Interpolation Temperature Interpolation WorkflowWorkflow

Training sample

GARPrule set

Test sample

Sinkholeoccurrence

EcoGridQuery

EcoGridQuery

LayerIntegration

SampleData

+A3+A2

+A1

DataCalculation

MapGeneration

Validation

User

Model qualityparameters

Prediction map:

Sinkholedistribution

Environmental

layers:Groundwate

r level, chemistry,

etc

GenerateMetadata

ArchiveTo Ecogrid

Selectedprediction

maps

PhysicalTransformatio

n

Scaling

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

EcoGridDataBase

Integrated layers

Integrated layers

GARPrule set

Page 15: Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

ExerciseExercise1. Divide into groups of 4 (or so) with similar research interests2. Pick a research topic to collaborate on3. Construct a workflow diagram for an analysis that could be

conducted4. Discuss how it could be reused for other related or unrelated

analyses