managing data with iplant introduction to uploading, downloading, sharing, and metadata in the data...
TRANSCRIPT
Managing Data with iPlant
Introduction to Uploading, Downloading, Sharing, and Metadata in the Data Store
Background on the iPlant Data Store
• iPlant Learning Center provides quick tutorials, slides, and documentation for everything you will see here• Backbone of the iPlant CI• Connects to all iPlant services• Appropriate for any (research) file types of any size• Cloud Based, Backed Up, Initial 100 GB expandable to 1 TB• Built on iRODS
• Folder = Collection• Other than that, you don’t have to think about this if you don’t want to
• Demo 1 – Navigating the Data Store in the Discovery Environment
Upload and Download In the Discovery Environment
‘Simple’, for small files (~ 5 files, <1.9 GB)
‘Bulk’, for larger files and folders (<10GB)
Import from URL (no size limit)
Advantage + Disadvantage -
• Covers most upload/download sharing needs• Point and Click
• Some size/speed limitations
Demo 2 – Transferring Data from the Discovery Environment
TipsSpaces /Special Characters• Many software packages are sensitive to spaces in file
names and/or the special characters below. Users may wish to rename uploaded files before using them in an analysis. Good advice for any transfer method.
~ ` ! @ # $ % ^ & * ( ) + =
{ } [ ] | \ : ; " ' < > , ? /
Bulk Transfers• Requires Java 6 or later be enabled in your browser (http://www.java.com)• Not currently compatible with Google Chrome• Window and web browser must remain open and active until the
transfer is complete
Import from URL• Monitor Notifications to check that the URL import has been submitted –
you will receive a notification when import is complete
Tips
iDrop Desktop Drag and Drop files and folders File sizes up to your total allocation Fast transfers Synchronize folders with Data Store Download and Installation Instructions
Can demo installation
Advantage + Disadvantage -
• Upload/download large file sizes and numbers of files
• Sharing and permission features more complex
Demo 3 – Transferring Data with iDrop Desktop
iDrop• Requires Java 6 or later (http://www.java.com)• At the bottom of the iDrop window you can monitor, pause, and restart
transfers. You can also view additional details by clicking Manage. • When iDrop Desktop is open/running there will be an icon in your system
tray or menu along with other background programs (e.g. Wi-Fi, Bluetooth, etc.). You need to close iDrop from this icon to completely close the program.
Tips
iCommands
Advantage + Disadvantage -
• Customizability • Requires at least some command line expertise
Ability to script and automate Access from terminal/server Can resume transfers Download and installation instructions
Can demo installation
Demo 4 – Accessing the Data Store with iCommands
Tips
-f force - overwrite local files -P output the progress of the download.-r recursive - retrieve subcollections (directories)-T renew socket after 10 min.(use with large files)-V Very verbose
Useful iget / iput options:
Sharing Files in the Data Store
Discovery Environment Sharing Sharing via Public Link• Share files/folders instantly• Control access permissions• Manage sharing between collaborators
• No iPlant account required• Limited to individual files• URLs are public (less secure, can revoke)
2 Easy ways to share data within the Discovery Environment
Demo 5 – Data Sharing in the DE
Tips• When sharing via the Discovery Environment, use the following chart to
decide the permissions you wish to grant:
Permission Read Download Metadata Info Types Rename Move Delete
Read Write Own
Need help?
• iPlant Learning Center provides quick tutorials, slides, and documentation for everything you will see here
• http://ask.iplantcollaborative.org/questions/
Analyses in the Discovery Environment
Using Bioinformatics Apps
Background on the iPlant Discovery Environment• iPlant Learning Center provides quick tutorials, slides, and
documentation for everything you will see here• So far we have mainly looked at the Data Tab• DE is also a powerful interface for Apps and Analyses• This is where scalability and extensibility really come into play• Can customize and create new apps• Seamlessly integrated with high-throughput computing• Apps are linked to iPlant wiki for documentation• Demo 1 – Apps and analyses in the DE
Tips
• Mark an App as a favorite and it will appear in your workspace.
• Use the Apps menu to:• Rate Apps to provide feedback• Click the info icon to see the user manual for an App
• Re-launch a job by clicking on the App name in the Analysesmenu. The App will re-launch populated with the last parameters used, givingyou the option to alter the settings you want.
Tips
• View a file containing metadata (settings, etc.) connected to your analysis. Select the a job and click View Parameters.
Tips
• Can customize tools in the DE (simplifying here)
• App installation / modification involves happens at two levels:• Apps (and dependencies) are installed on DE Cluster (done by iPlant support)
• DE Interface (created by App integrator/user) and published to DE
• Detailed instructions with videos, manuals, documentation in Learning Center
Viewing and Editing MetadataIn the DE
• User metadata stored AVUs• Attribute – Value – Unit
• Template-based metadata
• Can view and edit from with iCommands (tomorrow)
Demo 2 – Metadata in the DE
Tips
• Currently can only use one template at a time
• Can create custom metadata templates
Metadata in the DE
Need help?
• iPlant Learning Center provides quick tutorials, slides, and documentation for everything you will see here
• http://ask.iplantcollaborative.org/questions/
Syncing Folders With iDrop
iPlant Learning Center provides quick tutorials, slides, and documentation for everything you will see here
iDrop Desktop Synchronization
1. Click on Settings2. Click on the Synchronization tab3. Click New to start a new synchronization4. Enter a name (e.g. Project 1 sequence data)5. Click Choose Local Folder to select a folder6. Select a synchronization mode7. Select a frequency for synchronization8. Click Choose iRODS Folder to select the location to synch to9. Click Update to save your synchronization
Demo – DropBox Sync
Tips
Choose one of three methods to synchronize files:
• Local > iRODS – files will copied at synch to Data Store
• Local < iRODS – files in Data Store copied to local computer at synch
• Local <> iRODS – files in Data Store and local computer both synched
• If syncing to a DropBox folder, Dropbox must be paused during set up
Need help?
• iPlant Learning Center provides quick tutorials, slides, and documentation for everything you will see here
• http://ask.iplantcollaborative.org/questions/
Advanced Topics
Searching in the DE, iCommands and Metadata, Data Commons Plans and
Progress
Searching in the Discovery Environment
Demo 1 – Basic and advanced searching
• Basic search bar lets you search all files and folders where you have permission
• Advanced search features allows searching based on metadata, permissions, and share status
iCommands to view and edit metadata
Demo 2 – imeta commands
• Ability to interact with metadata at the command line
• Already installed as part of iCommands
• Documentation is a little wimpy• Try Here and Here
Tips
• At the moment, metadata added via template is not available from the command line
Data, Project, Research Management In the Data Commons
• Tools and for sharing, managing, and publishing data
• A home for high-value public dataset to be used with iPlant analysis tools
• A way to manage projects
• Metadata templates and workflows for common analysis types
• Working hard to lay the groundwork, with development starting early 2015
Data, Project, Research Management In the Data Commons
• Additional layer on top of the Data Store for Users to publish packages of data and metadata to the Data Commons and supported external repositories with appropriate long-term identifiers and licenses
• Data will be static, searchable, discoverable, and linked to external repositories
• Based on Data Strategy, current CI, Developer input, recognized need for additional components to support Data Commons effort
Data Commons
Data Commons Development Plans
Staging AreaDE Project Interface
Planned Features• Define ‘Project’ data / metadata• Faceted view of Data Store based
on metadata• Organize data, enter standardized
and free-text metadata, and match file types with suitable analysis options
Metadata Progress• Data management and Genomic
use cases mature and available for development
• Existing tags, metadata collection, and metadata search are critical components
Next Steps• Development based on use cases
and manual walkthrough• Define CI-wide Project concept
Planned Features• Interface between Data Commons
and rest of the CI• Select from Project Interface and
distill data/metadata into package for publication to the Data Commons and beyond
• Select appropriate licenses and Identifiers for data and external repositories
• Metadata “carry through” from other platforms
Planned Features• Static, searchable, discoverable,
licensed data packages with persistent identifiers and links to external repositories
• Data will be available and useful to the community, not buried in Data Store
Next Steps• Define Developer needs based on
use cases and data models
• Define entry beyond DE
• Integrate ontologies and controlled vocabularies with metadata
Next Steps• Manually shepherd existing use
case through all components to better define Development needs
• Develop requirements for additional use cases and data types
• Provide documentation
• Define potential EOT deliverables
Goal: Data models and workflows for the entire data lifecycle
specimencollection
analysis
project creation publication
Interface Mockups are in Review
Demo 3 – Data Commons Interface Mockups
Metadata Templates for G2F
• Open discussion now that we are all updated on G2F and iPlant infrastructure
• Project Interface needs for G2F?• Metadata needs for G2F?• Other needs?• Perhaps start here?
Thanks!
• iPlant Learning Center provides quick tutorials, slides, and documentation
• http://ask.iplantcollaborative.org/questions/