research computing at big data: advanced michigan · research computing at michigan seth meyer...

14
Big Data: Advanced Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu www.arc-ts.umich.edu [email protected]

Upload: others

Post on 29-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Big Data: Advanced Research Computing at

MichiganSeth Meyer

Research Computing [email protected]

www.arc-ts.umich.edu [email protected]

Page 2: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Advancing Computational Productivity

• Providing systems, infrastructure, and consulting

• Delivering services at scale (HPC, Storage, BigData)

• Responding to cutting edge research needs at all stages of the research lifecycle

• Partnering with faculty, researchers, students, administrators, and central and unit IT experts to meet campus needs

2

Page 3: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

3

Page 4: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Flux - High Performance Computing Cluster

4

Key Features

Compares to a 1500-5000 typical workstations within ISRCost recovery based serviceFree for undergraduatesWill be replaced by Great Lakes this summerIA sensitive data guide defines allowed data typesWhat software does it support? Over 100 Titles: TensorFlow, Tor, MATLAB, Mathematica, FEA/CAE Abaqus, Ansys, SAS, STATA/MP, R, Compilers, Debuggers, Libs

Page 5: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Data Science using ML on Medical Records Popular on Flux, Conflux, FluxOE

5

Cross discipline faculty using TensorFlow, TorchFaculty from Medical School, Engineering, and othersExploring graphs/social networks also possible accelerated by CUDAR is also popular on flux for data science

Page 6: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Armis - High Performance Computing Clusters

6

Key Features

Compares to about 1% the size of Flux (mention not restricting)HIPAA/PHI compliant IA sensitive data guide defines allowed data typesSame software support as FluxArmis-2 coming with Great Lakes later this year

Page 7: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Cavium Hadoop Cluster: Big Data/Data Science/ML

Key FeaturesFirst production aarch64/arm8 Hadoop cluster in the world

Currently seeking input on phase 2 of proposal with donor cavium/marvel for the next Big Data AI/ML data science cluster (possible directions include more CUDA GPU acceleration, and Kubeflow ML framework)

Distributed in-memory analytics with Spark in Java, Python, R, and Scala

Free

Not HIPAA/PHI compliant at the moment but this may be changing in the future7

Page 8: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Cavium Hadoop Cluster: Big Data/Data Science/ML: Real World Example

Shared public data sets available, such as Reddit, so researchers can run Spark Queries such as:

SELECT subreddit, COUNT(body) as count FROM reddit_table WHERE body rlike 'father' AND body rlike 'depressed' GROUP BY subreddit ORDER BY count desc

Also used for teaching in graduate and undergraduate courses in Engin/EECS, LSA, UMSI8

Page 9: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Yottabyte Research Cloud - Database, Enclave, and Services

Key Features

Software-defined infrastructure to build a sensitive data enclave in the on-premise cloudProvides secure, trusted environment with a breadth of research services (Windows remote desktop, Server, Big Data, Compute) to flexibly meet needs

Collaborate with PIs to taylor setup as needed to Provide a personalized environment based on data access controls (HIPAA/PHI/CUI and more)IA sensitive data guide defines allowed data types

9

Page 10: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Turbo - High Speed Research Storage

Key Features

Fast access to data from desktops, clusters or server

IA sensitive data guide defines allowed data typesHIPAA/PHI allowed with nfsv4 and kerberos

Data publishing/sharing with Globus

Some schools subsidize costs

High speed cross campus data storage

10

Page 11: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Grant Consultation and Collaboration

Eg: ConFlux - CDDCP

2015 NSF MRI Award. $3.5M budget with several novel technologies.

ARC staff handle technical and administrative issues.

Eg: OSiRIS - CNSCCS

2015 NSF DATANET Award. $4.9M budget.

Joint project between 4 universities around storage for scientific collaboration.

ARC can help design technical solutions in your grant

11

Key Features

ARC-TS staff contribute solutions and writing for grants.We bring our partners in Unit and Central IT as well as Library or others.

Page 12: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Most Recent Grant: SLATE

Key FeaturesSLATE: Service Layers At The Edge, a collaborative project between the University of Michigan, University of Chicago and the University of Utah. The SLATE team is working to deliver a new platform for scientific cyberinfrastructure (see http://slateci.io/). Building upon container-based technologies (e.g. Docker, Kubernetes, Helm, …) we aim to create a distributed environment where scientific collaborations can create, deploy and operate the tools they need to manage their data collection, distribution, and processing.

Service Layers at the Edge

12

Page 13: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Cloud and Off Campus Services/Consultation

CloudConsultation support for AWS, GCP, Azure

Data Transfer and Sharing National Providers

Connecting university units to systems and data, on site or in the cloud

13

Page 14: Research Computing at Big Data: Advanced Michigan · Research Computing at Michigan Seth Meyer Research Computing Lead smeyer@umich.edu ... scientific collaboration. ARC can help

Thank You - Contact

[email protected]

• http://arc-ts.umich.edu/

• @ARCTS_UM

[email protected]

• Research Computing Lead

14