![Page 1: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/1.jpg)
Ben Blaiszik ([email protected]),Kyle Chard ([email protected])Ian Foster ([email protected])
materialsdatafacility.org
The Materials Data Facility
![Page 2: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/2.jpg)
What is MDF?
2
We aim to make it simple for materials datasets and resources to be ...
PublishedIdentifiedDescribedCurated
VerifiableAccessiblePreserved
DiscoveredSearchedBrowsedShared
RecommendedAccessed
and
SRD
Publishable ResultsPublished Results
Resource DataRef Data
Derived DataWorking Data
![Page 3: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/3.jpg)
3
What infrastructure do we need to effectively
support materials researchers?
![Page 4: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/4.jpg)
Service Infrastructure
4
Publication Discovery
Data Interaction
And VizResource
Registration
APIs
+ +
+ - Initial Foci
+
![Page 5: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/5.jpg)
5
Publication
APIs
• Identify datasets with persistent identifiers (e.g. DOI)
• Describe datasets with appropriate metadata, and provenance
• Curate dataset metadata and data composition
• Verify dataset contents over time
• Preserve critical datasets in a state that increases transparency, replicability, and helps encourage reuse
Deployed Nov. 2015
![Page 6: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/6.jpg)
6
Discovery
• Search and query datasets in modern ways – e.g. via indexed metadata rather than remembering file paths
• Discover distributed materials resources (more later)
Future...
Spotlight for all data you have
access to regardless of
location
Coming late 2016-ish
![Page 7: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/7.jpg)
7
Resource Registration
APIs
• Find existing, widely distributed, materials resources
• Register new resources into the network
Coming Q1 2016 via NIST
![Page 8: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/8.jpg)
8
APIs
• Data-driven experiments using HPC resources and workflow technologies
• Real-time interaction with data regardless of data location (pending appropriate data access) and data size
• (future) Machine learning across datasets and storage locations
• (future) Automated discovery support
Cloud DB
Experiment Simulation
data, metadata, pointers
data, metadata, pointers
Data Interaction
And Viz
Analytics
![Page 9: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/9.jpg)
Understanding Incentives is Critical
9
Meeting Award Requirements
Smoothing Dislocations
Increasing Impact
• Increase paper citations1
• Add dataset citation capabilities
• Enable simple sharing among collaborators (near and far)
• Ease transitions between students
• Lessen need for ad hoc resource sharing (e.g. via group websites)
• Simplify DMP compliance
1 Citation increase 30 (10.7717/peerj.175) - 60% (10.1371/journal.pone.0000308) [caveat bio research]
![Page 10: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/10.jpg)
10
So where are we now?
Publication
![Page 11: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/11.jpg)
11
Materials Data Publication/Discovery is Often a Challenge
Data Collection Data Storage and Process Publication
![Page 12: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/12.jpg)
12
Materials Data Publication/Discovery is Often a Challenge
Data Collection
???
Need networked storage, sometimes many TBNeed to uniquely identify data for search/citeNeed custom metadata descriptionsNeed a data curation workflowNeed automation capabilities
Data Storage and Process Publication
Want to Discover / Use
Want to Publish
![Page 13: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/13.jpg)
13
Materials Data Publication/Discovery is Often a Challenge
Data Collection
???
Need storage, sometimes many TBNeed to uniquely identify data for search/citeNeed custom metadata descriptionsNeed a data curation workflowNeed automation capabilities
Data Storage and Process Publication
Want to Discover / Use
Want to Publish
![Page 14: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/14.jpg)
Collection Model
14
• Collections might be a research group or a research topic...
• Collections have specified§ Mapping to storage endpoint
§ Currently handled as automatically created shared endpoints
§ Metadata schemas§ Access control policies§ Licenses§ Curation workflows
• Collections contain§ Datasets
§ Data§ Metadata
• Metadata Persistence§ Metadata log file with dataset§ Metadata replicated in search
index
![Page 15: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/15.jpg)
Publish Large Datasets
15
• Leverages Globus production capabilities for file transfer, user authentication, and groups
• 100 TB of reliable storage @ NCSA, and more storage at Argonne§ Globusendpointatncsa#mdf§ ExpandabletoPBsasnecessary§ Automatedtapebackupforreliability(inprogress)
• Optionally use your own local or institutional storage
![Page 16: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/16.jpg)
Uniquely Identify Datasets
16
• Associate a unique identifier with a dataset§ DOI,Handle
• Improve dataset discovery and citability§ Aligningincentivesandunderstandingtheculture
willbecriticaltodrivingadoption
Data
set D
ownl
oads
Time
• Your work has been cited 153 times in the last year
• Researchers from 30 institutions have downloaded your datasets
Future...
![Page 17: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/17.jpg)
Share Data with Flexible ACLs
17
• Share data publicly, with a set of users, or keep data private
Leverage Curation Workflows• Collection administrators can specify
the level of curation workflow required for a given collection e.g.§ Nocuration§ Curationofmetadataonly§ Curationofmetadataandfiles
![Page 18: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/18.jpg)
Customize Metadata
18
• Build a custom metadata schema for your specific research data
• Re-use existing metadata schemas• Working in conjunction with NIST
researchers to define these schemas
• Can we build a system that allows schema:§ Inheritance
§ E.g. a schema “polymers” might inherit and expand upon the “base material” of NIST
§ Versioning§ E.g. Understand contextually how to map fields
between versions§ Dependence
§ E.g. Allows the ability to build consensus around schemas
Future...
![Page 19: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/19.jpg)
Discover Research Datasets
19
• Search on file metadata, custom metadata, and indexed file-level data
• Goal: Intuitive search (e.g. Google-style) with support for more complex range queries and faceting (e.g. Amazon-style)
![Page 20: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/20.jpg)
MaterialsDataFacility.org
20
![Page 21: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/21.jpg)
21
MDF Submission Walkthrough
![Page 22: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/22.jpg)
Example Use Case
22
Publishing Big, Remote DataGroup has taken 50 TBof data at APS need to send back to home inst. For analysis and archiving
Bundle multiple experimental runs with metadata and provenance
PI wants to verify dataset data/metadata before pub.
Want a citable DOI to share the raw and derived data with the community
Want their data to be discoverable by free text search and custom metadata
![Page 23: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/23.jpg)
MDF Collection Home
23
![Page 24: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/24.jpg)
MDF Collections
24
Recall: Policies Set at the Collection Level• Required metadata, schemas• Data storage location• Metadata curation policies
![Page 25: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/25.jpg)
MDF Metadata Entry
25
• Scientist or representative describes the data they are submitting
• For this collection Dublin Core and a custom metadata template are required
![Page 26: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/26.jpg)
MDF Custom Metadata
26
• Scientist or representative describes the data they are submitting
• For this collection Dublin Core and a custom metadata template are required
![Page 27: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/27.jpg)
Dataset Assembly
27
• Shared endpoint is auto-created on collection-specified data store
• Scientist transfers dataset files to a unique publish endpoint
• Dataset may be assembled over any period of time
• When submission is finished, dataset will be rendered immutable via checksum
(e.g. APS) (e.g. UC Berkeley)
![Page 28: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/28.jpg)
Dataset Curation
28
• Optionally specified in collection configuration
• Can be approved or rejected (i.e. sent back to the submitter)
![Page 29: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/29.jpg)
Mint a Permanent Identifier
29
CanoptionallybeDOI orHandle
![Page 30: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/30.jpg)
Dataset Record
30
![Page 31: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/31.jpg)
Dataset Discovery
31
![Page 32: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/32.jpg)
32
Registering Materials
Resources[NIST – Youssef, Dima]
![Page 33: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/33.jpg)
Materials Resource Registry
33
MaterialsScienceDataChallengehttp://acceleratornetwork.org/mse-challenge/
![Page 34: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/34.jpg)
Materials Resource Registry
34
Materials Accelerator Network
![Page 35: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/35.jpg)
Materials Resource Registry
35Browse Results [NIST – Youssef, Dima]
![Page 36: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/36.jpg)
Service Infrastructure
36
Publication Discovery
Data Interaction
Resource Registration
APIs
+ +
+ - Initial Foci
![Page 37: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/37.jpg)
What’s Available?
37
• Web interface to support data publication via Globus platform (identify management, user groups, optimized big data transfer)
• 100 TB of storage at NCSA (scalable to >1 PB) more at Argonne (?)
• Help with developing metadata schemas to describe your research datasets
![Page 38: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/38.jpg)
Yet another publication system?
• Software as a Service• Self service management
§ Identifiers, policies, submission and curation workflows, storage, metadata, access control
• Remote storage• Supports arbitrarily large datasets• Powerful search
38
![Page 39: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/39.jpg)
Current Interactions
![Page 40: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/40.jpg)
What are we looking for?
40
• Early adopters, willing to get their hands dirty with the service and give honest feedback
• Key datasets of all sizes, shapes, raw or derived, that might help us understand the process better
• Currently working with researchers from UIUC, NWU, UC Berkeley, UW-Madison, UMichigan, Argonne
![Page 41: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/41.jpg)
Next Steps
41
Year 1 Year 2 Year 3Thread 1: Initial publication, search
Thread 3: Expand publication capabilities
Thread 2: Expand search capabilities
Thread 4: Build and operate Materials Data Facility
• Identify datasets to pilot publication pipelines and build schema repository
• Engage with researches working with materials data to understand use cases and learn friction points
• Please talk to us if you have data you want to share, publish, discover, …
• Globus tutorials (identity, transfer, sharing): https://github.com/globusonline/globus-tutorials
now
![Page 42: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/42.jpg)
Thanks to Our Sponsors!
42
U. S . DEPARTMENT OF
ENERGY
![Page 43: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/43.jpg)
43
Globus delivers…
big data transfer, sharing,publication, and discovery…
…directly from your own storage systems…...via software-as-a-service
![Page 44: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/44.jpg)
Globus is SaaS
44
• Access made easy via Web browser§ Command line, REST interfaces, python
clients for flexible automation and integration
• New features automatically available without user updates
• Reduced IT operational costs§ Small local footprint (Globus Connect)§ Consolidated support and troubleshooting
![Page 45: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/45.jpg)
User Experience
45
…for your photos
…for your entertainment
…for your research data
…for your office docs
![Page 46: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/46.jpg)
Globus Platform-as-a-Service (PaaS)
46
Identity management
User groups
Data transfer
Data sharing
• Share directly from your storage device (laptop or cluster)
• File and directory-level ACLs
• Manage user group creation and administration flows
• Share data with user groups
• High-performance data transfer from a web browser
• Optimize transfer settings and verify transfer integrity
• Add your laptop to the Globus cloud with Globus Connect Personal
• create and manage a unique identity linked to external identities for authentication
Data publication
![Page 47: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/47.jpg)
Background
47
Endpoint• E.g. laptop or server
running a Globus client (e.g. Dropbox client)
• Enables advanced file transfer and sharing
• Currently GridFTP, future GridFTP + HTTP
Some Key Features• REST API for
automation and interoperability
• Web UI for convenience
• Optimizes and verifies transfers
• Handles auto-restarts
• Battle tested with big data
![Page 48: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/48.jpg)
Background
48
B
Globus moves the data for you
secureendpoint,
e.g. laptop
You submit a transfer request Globus
notifies you once the transfer is complete
secureendpoint,e.g. midway
transfer
A
Endpoint• E.g. laptop or server
running a Globus client (e.g. Dropbox client)
• Enables advanced file transfer and sharing
• Currently GridFTP, future GridFTP + HTTP
Some Key Features• REST API for
automation and interoperability
• Web UI for convenience
• Optimizes and verifies transfers
• Handles auto-restarts
• Battle tested with big data
![Page 49: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/49.jpg)
Background
49
Endpoint• E.g. laptop or server
running a Globus client (e.g. Dropbox client)
• Enables advanced file transfer and sharing
• Currently GridFTP, future GridFTP + HTTP
Some Key Features• REST API for
automation and interoperability
• Web UI for convenience
• Optimizes and verifies transfers
• Handles auto-restarts
• Battle tested with big data
![Page 50: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/50.jpg)
Identity Management
50
Key Features
• Leverage institutional credentials
• Link multiple identities
• Standard oAuth2 flow
![Page 51: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/51.jpg)
User Groups and Sharing
51
Globus tracks shared files – no need to move files to cloud storage
A user selects file(s) to share and sets access permissions for individuals or groups
User B logs in to Globus to access shared file(s)
datasource
A
B
Key Features• Share without users requiring
accounts on your systems.
• Share data in place, no need to move to the cloud
• Maintain security and access controls as defined by your resource provider.
• Groups: Manage permissions on a group basis rather than on an individual basis
![Page 52: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/52.jpg)
Globus PaaS Jupyter Notebooks
52
https://github.com/globusonline/globus-tutorials
Identity management
User groups
Data sharing
Data transfer
![Page 53: The Materials Data Facility - Northwestern University · 5 Publication APIs • Identify datasets with persistent identifiers (e.g. DOI) • Describe datasets with appropriate metadata,](https://reader033.vdocuments.site/reader033/viewer/2022050110/5f47b223dc4fc417b2738410/html5/thumbnails/53.jpg)
Create an Account
53
1. Go to: www.globus.org/signup2. Create your Globus account3. Validate e-mail address4. Optional: Login with your
campus/InCommon identity5. Install Globus Connect Personal6. Clone repo
§ git clone https://github.com/globusonline/globus-tutorials
7. Move files from kyle#ncsa-tutorialendpoint to your laptop