using r in kepler dan higgins – nceas prepared for: ecoinformatics training for ecologists lter...

23
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007 http://www.kepler-project.org

Upload: wendy-jackson

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Using R in Kepler

Dan Higgins – NCEAS

Prepared for:

Ecoinformatics Training for Ecologists

LTER (Albuquerque)

January 8-12, 2007

http://www.kepler-project.org

Page 2: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

What is R?

• R is a language and environment for data manipulation, statistical computing, and graphics.

• R is open-source and thus can be freely downloaded and used at no cost

• The R Project for Statistical Computing– http://www.r-project.org/

• NCEAS R Programming Language Resource Center– http://www.nceas.ucsb.edu/scicomp/RProgTutorialsLatest.html– http://www.nceas.ucsb.edu/scicomp/RShortCourse.html

Page 3: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Page 4: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Page 5: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Page 6: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

RGui

Page 7: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Page 8: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

RGui (Windows)

Page 9: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

R Example

With only 3 lines, one can read a data table,plot all combinations of column data, andsummarize the data

Page 10: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Kepler and R

• R language has many similarities to the Kepler expression language

• R language emphasizes operations on vectors, matrices, and tables (‘data frames’) rather than scalars. (This eliminates many explicit looping statements)

• Many detailed statistical operations and data manipulation routines already exist in R

• R has ability to create sophisticated graphic displays

• Being able to call R routines from Kepler greatly simplifies many workflows

Page 11: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Simple R Workflow

Just drag an RExpression actor to the work area, add a director, and connect the outputs to a display and imageJ actors

Display is the same as one sees running the R script from the command line

Page 12: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

RExpression Actor Parameters

Page 13: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Arrays and Graphical Output

R Script:ccc <- aaa + bbbcccplot(aaa,bbb)

Adding ports automatically creates R objects withthe port name [e.g. aaa <- c(1,2,3,4)]

Graphics automatically saved as images and sent to‘graphicsFileName’ output port (as file name)

R text output automatically sent to ‘output’ port

Page 14: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Adding ports creates R objectsfrom Kepler tokens

R script is a parameter of theRExpression actor whichuses port names

RExpression – Ports & Parameters

Page 15: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Tables are represented as ‘Data Frame’ objects in ‘R’

A Ptolemy ‘Record of Arrays’ can also represent a table

R Script: summary(df)

where ‘df’ is the Rdataframe createdautomatically whena record of arrays ispassed to an input port

AAA BBB

one 1

two 2

three 3

four 4

Array Records and Data Frames

Page 16: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

R Dataframes

AAA BBB

one 1

two 2

three 3

four 4

In R, a ‘dataframe’ represents a table

A dataframe is a list of column vectors

Each column has the same kind of data(e.g. a number or a string)

Each column can have a name (e.g.‘AAA’ or ‘BBB’ )

AAA <- c(“one”,”two”,”three”,”four”)BBB <- c(1,2,3,4)df <- data.frame(aaa,bbb)Creating a dataframe df

1st Column: df[1,], df[‘AAA’,]

2nd Row: df[,2], df[AAA==‘two’,]

Selecting Parts of a dataframe

Page 17: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Using Multiple R Actors

Page 18: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Using Multiple R Actors - Result

Page 19: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

R Summarize Table By Species

Page 20: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

R Pairs Plot

Page 21: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Configuring an EML Datasourcefor Use in RExpression

Use “As Column Vector” topass the entire column atone time (i.e. an R vector)

Page 22: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Custom RExpression Actors

RExpression actors withpre-built R scripts canbe added to the Kepleractor list. Examples ofcurrent customs actorsare shown here.

This provides tools forusers that are unfamiliarwith R scripts

Page 23: Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Acknowledgements•This material is based upon work supported by:

•The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.

•Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis

•The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.

•The Andrew W. Mellon Foundation.

•Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON