the data exacell – dxc) pilot project enables new application architectures on bridges and...
TRANSCRIPT
![Page 1: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/1.jpg)
The Data eXaCell – DXC
J. Ray Scott DXC PI
May 17, 2016
![Page 2: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/2.jpg)
2 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
DXC Leadership
• Mike Levine Co-Scientific Director Co-PI • Nick Nystrom Senior Director of Research Co-PI • Ralph Roskies Co-Scientific Director Co-PI • Robin Scibek Project Manager PM • J. Ray Scott Senior Director of Facilities Technology PI
![Page 3: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/3.jpg)
3 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Pittsburgh Supercomputing Center
• The Pittsburgh Supercomputing Center: – Joint effort of
Carnegie Mellon University and the University of Pittsburgh – 30 years national leadership in:
• High-performance and data-intensive computing • Data management technologies • Software architecture, implementation, and optimization • Enabling researchers nationwide • Networking and network optimization
– Supported by: NSF, NIH, the Commonwealth of Pennsylvania, DOE, DoD, foundations, and industry
![Page 4: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/4.jpg)
4 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
DXC/DIBBs
• Project in a nutshell: – DXC/DIBBs: – Accelerated, development pilot project – Creating, deploying and testing relevant software and hardware building
blocks – Functionalities designed to support data-analytic capabilities for data intensive
scientific research
• Guided by selected collaborating research groups – Diverse set of emerging and existing data-intensive & data-analytic
applications – Not well served by local resources or existing HPC systems – Learn what they (and presumably others) need.
![Page 5: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/5.jpg)
5 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Radio Astronomy at Green Bank (NRAO) PI: David Halstead, National Radio Astronomy Observatory
The GBT Mapping Pipeline is a new software tool intended to ease the production of sky maps from this massive data stream. Mapping of large patches of sky is one of the main uses of the GBT, and is complementary to the highly focused studies from facilities like the EVLA.
NRAO and PSC are collaborating to leverage coupled storage and analytics on the DXC (and later, Bridges) for the Mapping Pipeline.
The Robert C. Byrd Green Bank Telescope (GBT) has a dish diameter of 100 meters and wavelength sensitivity from 3m down to 2.6mm. Thanks to new focal plane receivers and back-end equipment, the volume of data produced by the GBT is rising rapidly.
![Page 6: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/6.jpg)
6 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Data Exacell Storage (SLASH2)
Galaxy: DXC Pilot
Galaxy @ TACC
Galaxy @ Penn State
Galaxy @ PSC
Workflows
Data
PSC compute resources
Workflows
Data
Workflows
Data
![Page 7: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/7.jpg)
7 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
SLASH2
• SLASH2 is designed from the ground up to be: – wide-area – portable – Scalable
• Features – files are managed as chunks – system managed replication – error checking
![Page 8: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/8.jpg)
8 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
DXC SLASH2 Schematic
DXC SLASH2 Command and Control
MDS
MDS MD
SBB
SBB
Clients Clients
…
FDR56 7.5 GB/s
FDR56 7.5 GB/s
7x & growing hJB hJB hJB hJB
Server
FDR56 7.5 GB/s
SAS-3 12 Gbps
DXC SBB 512 TB (u)
hJB hJB hJB hJB
4*44 4 TB (r) ~8x DSC
~15x DSC
PCIe-3
A DSC hardware building block
‘hJB’ = half of a JBOD
![Page 9: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/9.jpg)
9 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
File Systems Development Support
• DXC will involve development of advanced file system support
• Initial effort to revamp support tools used in SLASH2
• Portable File system Libraries (PFL)
• Weldable Overlay Knack File System (WOKFS)
![Page 10: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/10.jpg)
10 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
ADAPT-FS: Active Data Processing and Transformation File System • On-the-fly CPU/GPU computation
• Replaces explicit storage of processed images
• Enables collaborative processing and sharing of large image data sets with minimal data duplication – 3D electron microscopy data of brain tissue – currently in the 100TB range, – petabyte scales forthcoming)
• Portable File system Library module with a flexible interface
• per-dataset specification of data interpretation, preparation, and transform as submodule drivers.
![Page 11: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/11.jpg)
11 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Multi-site Support
• Multiple metadata servers (MDS) – global mount support – foundation for further multi-MDS development
• SLASH2 ↔ local file multi-site file import/export • Workflow integration
– XSEDE Extended Support for Science Gateways
• Cross site UID mapping – security – federated authentication
• Enhanced access controls – e.g. read/delete only file access – building block: SCAMPI file system
• Public cloud support
![Page 12: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/12.jpg)
12 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Pittsburgh Genome Resource Repository (PGRR) pgrr.pitt.edu
Collaborative effort to address challenges with TCGA data:
– University of Pittsburgh: Institute for Personalized Medicine (IPM), U. Pitt. Cancer Institute (UPCI), Department of Biomedical Informatics (DBMI), Center for Simulation and Modeling (SaM)
– University of Pittsburgh Medical Center (UPMC)
– Pittsburgh Supercomputing Center (PSC)
![Page 13: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/13.jpg)
13 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
PGRR DXC Architecture
Analytics Cluster
Gateway Service Nodes
SLASH2
High Speed Wide-area Network
UID Mapping
Data Source
Replication
MDS
PSC PGRR
![Page 14: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/14.jpg)
14 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
DXC Hardware To Support Research Collaborators
• Equipment in place – 41 servers
• 128GB – GPU
• 3TB • 12TB
– 5 PB of SLASH2 managed shared storage
• Being used by both developers and collaborators
![Page 15: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/15.jpg)
15 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
DXC Shared File System
• Available for DXC applications and development efforts
• Mixed capability components to facilitate optimization testing
• Subsets withheld to allow invasive testing without harm to collaborators’ data
• DXC shared file system is a Building Block deliverable – commodity components – testing and optimization in progress – a prototype for research groups requiring inexpensive, large-scale storage
![Page 16: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/16.jpg)
16 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
XSEDE Service Provider Support
• Blacklight – SGI UV 1000 – 16 TB * 2 coherent shared memory
• Greenfield – HP Superdome X
• 12 TB coherent shared memory – HP DL580
• 3 TB coherent shared memory – Shared file system
• 800 TB useable • SLASH2
![Page 17: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/17.jpg)
17 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Bridges and the Data Exacell : A Valuable Engineering Lifecycle • Hardware and software “building blocks” developed through the Data
Exacell (DXC) pilot project enables new application architectures on Bridges and convenient, high-performance data movement between Bridges and users, campuses, and instruments.
• Bridges and DXC will provide complementary roles for production and application prototyping.
2013 2014 2015 2016 2017 2018 2019
Upward arrows: • Software
development, selection, and configuration
• Certain elements of hardware configuration
Data Exacell (pilot project: data infrastructure building blocks)
Bridges Downward arrows: • New science and
application requirements from nontraditional HPC researchers
Production → Acquisition
Target Dates
![Page 18: The Data eXaCell – DXC) pilot project enables new application architectures on Bridges and convenient, high- performance data movement between Bridges and users, campuses, and instruments](https://reader033.vdocuments.site/reader033/viewer/2022050416/5f8c72e1a1eaae26db1aa006/html5/thumbnails/18.jpg)
18 © 2010 Pittsburgh Supercomputing Center
© 2016 Pittsburgh Supercomputing Center
Summary
• First phase – Building new storage and analytic facility – Gathering users – Understanding needs – Prototype solutions in place
• Next phase – Larger user experience – Multi-site support
• Authentication • Metadata services
– Distributed MDS – Data tagging