john porter sheng shan lu m. gastil gastil-buhl with special thanks to chau-chin lin and chi-wen...

22
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Upload: iris-hunt

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

John PorterSheng Shan Lu

M. Gastil Gastil-BuhlWith special thanks to Chau-Chin Lin and Chi-Wen

Hsaio

Page 2: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Maximizing the potential of LTER data to be used to make new ecological discoveries Moving from the era of single datasets to

large scale data integration Tens to hundreds of datasets

A first step to achieving this goal is to automate the mechanical processes associated with data ingestion into analytical software

Page 3: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

We want to: Identify a dataset in the LTER Network

Information System Download it Write a R statistical program to read the data Produce basic statistical summaries of the

ingested data How long should that process take?

With our tools we can do that in less than 1 minute!

Page 4: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Tool Description Works with Metacat

Works with PASTA

TFRI – R module Web-form-based system takes you through a multistep process to ingest data, do a basic quality assurance analysis and simple analyses

Manual data download

Manual data download

StatProg Web-form-based system that generates R, SAS, SPSS or Matlab programs that can be edited to process data

Manual data download

Manual data download

PASTAprog Web service – returns ready-to-use R, SAS or Matlab program. Can be run directly from inside R for 1-minute analyses!

Variable – some automated, some manual

Fully-automated download

Page 5: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Note: You do NOT need to have R installed on your PC to use this. It is entirely web-based.

Don’t be worried by the buttons! A fully English version is available at the URL above

Page 6: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio
Page 7: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Metadata

Display

Statistical

Functions

Raw Data

Upload

Select number type of the fieldIncude the field in R code ( select at least one )

ˇ

EML metadata transform into HTML by XSL Stylesheet

Page 8: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

No field header

Upload

Page 9: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Only for numerical attributes!Data Check Functions

Correct domain (real, integer)Range Checks

Action Options: Edit records with bad values

Set all the bad values to

missing ( NA )

Eliminate all the records with

bad

values

Ignore all the range check

problems (Just for value range

error)

Page 10: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Data Type Error:Value Range Error : Select 'Set all the

bad values to missing ( NA )' option

3

Update

3

54

54

The message for No data error

Page 11: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

This line can not be modified

Rest of the R program CAN be modified to reflect your analyses

Page 12: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio
Page 13: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio
Page 14: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Select program

type

Specify Metadata Document to Use

You can get the Package ID from the LTER Metadata catalog. Download a copy of the data, while you are there!

Or, you can specify a metadata document on a site server by giving the full URL

Page 15: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Importantly, you need to edit the program to point to where the data is stored on

YOUR computer, so the program can find it!

Page 16: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

The previous form-based programs have been available for several years Their performance has improved as Metadata

has gotten better But they still can be slower to use than we

would like, requiring manual editing and steps The advent of the LTER PASTA system

makes possible truly automated ingestion and analysis using a web service

Page 17: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

R “source” function specifying the web service URL and that we want to “echo” our commands to

the screen

Package ID from the PASTA Data Portal

Page 18: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

DONE! Our analysis has been run, and basic statistical summaries have been created for each of the attributes.

Page 19: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

You can now add additional commands to generate graphics etc. or merge to other datasets

Page 20: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Base URL: http://www.vcrlter.virginia.edu/webservice/

PASTAprog/ Plus – a Package ID (available on the PASTA portal)

E.g., knb-lter-vcr.26.14 Scope: knb-lter-vcr ID: 26 Revision: 14

Plus – A suffix indicating the type of program you want (e.g., .r, .sas, .spss, .m) for R, SAS, SPSS or Matlab

Page 21: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

http://www.vcrlter.virginia.edu/webservice/PASTAprog/knb-lter-vcr.26.14.r

You can also use the web service URL in a web browser to get a text copy of your

program

Note: There are other options that will let you use the web service for data OUTSIDE PASTA by specifying the URL of the EML metadata separately

Page 22: John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

Problems with Metadata Lead to lack of congruency between the

description of the data and the data itself* Bad practices in metadata - e.g., using special

characters, spaces or mathematical operations as part of the attribute names

Links to data in the metadata may not properly lead directly to data *

Problems with Data Inconsistent coding (character data where

numbers are expected) – causes conversion of numerical data into R “factors”

Dates – often are handled in different ways ????? – these systems need additional

testing on a wide array of data – and you can help!

* Much improved by PASTA system over earlier Metacat