research computing at big data: advanced michigan · research computing at michigan seth meyer...
TRANSCRIPT
Big Data: Advanced Research Computing at
MichiganSeth Meyer
Research Computing [email protected]
www.arc-ts.umich.edu [email protected]
Advancing Computational Productivity
• Providing systems, infrastructure, and consulting
• Delivering services at scale (HPC, Storage, BigData)
• Responding to cutting edge research needs at all stages of the research lifecycle
• Partnering with faculty, researchers, students, administrators, and central and unit IT experts to meet campus needs
2
3
Flux - High Performance Computing Cluster
4
Key Features
Compares to a 1500-5000 typical workstations within ISRCost recovery based serviceFree for undergraduatesWill be replaced by Great Lakes this summerIA sensitive data guide defines allowed data typesWhat software does it support? Over 100 Titles: TensorFlow, Tor, MATLAB, Mathematica, FEA/CAE Abaqus, Ansys, SAS, STATA/MP, R, Compilers, Debuggers, Libs
Data Science using ML on Medical Records Popular on Flux, Conflux, FluxOE
5
Cross discipline faculty using TensorFlow, TorchFaculty from Medical School, Engineering, and othersExploring graphs/social networks also possible accelerated by CUDAR is also popular on flux for data science
Armis - High Performance Computing Clusters
6
Key Features
Compares to about 1% the size of Flux (mention not restricting)HIPAA/PHI compliant IA sensitive data guide defines allowed data typesSame software support as FluxArmis-2 coming with Great Lakes later this year
Cavium Hadoop Cluster: Big Data/Data Science/ML
Key FeaturesFirst production aarch64/arm8 Hadoop cluster in the world
Currently seeking input on phase 2 of proposal with donor cavium/marvel for the next Big Data AI/ML data science cluster (possible directions include more CUDA GPU acceleration, and Kubeflow ML framework)
Distributed in-memory analytics with Spark in Java, Python, R, and Scala
Free
Not HIPAA/PHI compliant at the moment but this may be changing in the future7
Cavium Hadoop Cluster: Big Data/Data Science/ML: Real World Example
Shared public data sets available, such as Reddit, so researchers can run Spark Queries such as:
SELECT subreddit, COUNT(body) as count FROM reddit_table WHERE body rlike 'father' AND body rlike 'depressed' GROUP BY subreddit ORDER BY count desc
Also used for teaching in graduate and undergraduate courses in Engin/EECS, LSA, UMSI8
Yottabyte Research Cloud - Database, Enclave, and Services
Key Features
Software-defined infrastructure to build a sensitive data enclave in the on-premise cloudProvides secure, trusted environment with a breadth of research services (Windows remote desktop, Server, Big Data, Compute) to flexibly meet needs
Collaborate with PIs to taylor setup as needed to Provide a personalized environment based on data access controls (HIPAA/PHI/CUI and more)IA sensitive data guide defines allowed data types
9
Turbo - High Speed Research Storage
Key Features
Fast access to data from desktops, clusters or server
IA sensitive data guide defines allowed data typesHIPAA/PHI allowed with nfsv4 and kerberos
Data publishing/sharing with Globus
Some schools subsidize costs
High speed cross campus data storage
10
Grant Consultation and Collaboration
Eg: ConFlux - CDDCP
2015 NSF MRI Award. $3.5M budget with several novel technologies.
ARC staff handle technical and administrative issues.
Eg: OSiRIS - CNSCCS
2015 NSF DATANET Award. $4.9M budget.
Joint project between 4 universities around storage for scientific collaboration.
ARC can help design technical solutions in your grant
11
Key Features
ARC-TS staff contribute solutions and writing for grants.We bring our partners in Unit and Central IT as well as Library or others.
Most Recent Grant: SLATE
Key FeaturesSLATE: Service Layers At The Edge, a collaborative project between the University of Michigan, University of Chicago and the University of Utah. The SLATE team is working to deliver a new platform for scientific cyberinfrastructure (see http://slateci.io/). Building upon container-based technologies (e.g. Docker, Kubernetes, Helm, …) we aim to create a distributed environment where scientific collaborations can create, deploy and operate the tools they need to manage their data collection, distribution, and processing.
Service Layers at the Edge
12
Cloud and Off Campus Services/Consultation
CloudConsultation support for AWS, GCP, Azure
Data Transfer and Sharing National Providers
Connecting university units to systems and data, on site or in the cloud
13
Thank You - Contact
• http://arc-ts.umich.edu/
• @ARCTS_UM
• Research Computing Lead
14