big data meets hpc: getting better acquainted · big data meets hpc: getting better acquainted dan...

Post on 28-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data Meets HPC:Getting better acquainted

Dan Reed

Vice President for Research and Economic Development

University Computational Science and Bioinformatics Chair

Computer Science, Electrical Engineering & Computer Engineering, and Medicine

dan-reed@uiowa.edu www.hpcdan.org

2

Other forces matter Higgs

Quantum mechanics spacetime gravity

http://www.preposterousuniverse.com

3

4

Technical application complexity is rising

… along with multiple optimization axes

C, Fortran,

C++, MPI,

OpenMP

Python, Ruby, R

Cloud/Web

Services

Technical and mainstream software development have diverged

5

6

The talent in data analytics have shifted from science to companies. We can’t compete.

Astronomy researcher

Vectors(1980s)

Mainframes MPPs(1990s)

Clusters & Grids(2000s)

Clouds, Big Dataand Devices

7

Mostly similar technology issues With substantial differences• SAN/local storage models

• Virtualization and scheduling strategies

• Software development tools

• Culture and expectations

8

Programming efficiency

Network optimization

Supply chain optimization

Generic server design

Energy optimization

Systemic resilience

9

Requested 200 nodes and 2 PB for four years?

Logged onto a node and killed processes just to see what would happen?

Wished you could load containers rather than just applications?

Found your code performance limited by the I/O bandwidth of a Raspberry Pi?

Thought SAN was just a typo in a message meant for Sam?

Asked your system for recommendations?

Wondered why R came after S and C doesn’t matter?

10

Ethernet

Switches

Local Node

Storage

x86 +GPUs or

Accelerators

IB+ Enet

Switches

SAN+Local

StorageCommodity

X86 Racks

PFS

(e.g, Lustre)

Batch

SchedulerHDFS (Hadoop File System)System

Monitoring

MPI-OpenMP

CUDA/OpenCL

Perf &

Debug

(e.g.,

PAPI) NA Libs

Applications and Community Codes

Hbase BigTable

(key-value store)

AV

RO

Zo

okeep

er (co

ord

inatio

n)

Map-Reduce Storm

Hive Pig Sqoop Flume

Mahout, R and Applications

Domain-specific Libraries

FORTRAN, C, C++ and IDEs

Clo

ud

Serv

ices (e

.g., A

WS) VMs, Containers and Cloud Services

11

SAN vs. node-local temporary data

Software-defined networks (SDNs)

Content distribution networks (CDNs)

12

User allocations

Software models

Job scheduling

13

Chaos Monkey

Latency Monkey

Janitor Monkey

Conformity Monkey

Doctor Monkey

Security Monkey

14

Virtual machines (VMs)

Containers

Implications

15

UnstructuredSemi-structured

Next-generation architecture

(mostly open source and cloud based)

16

Known questions, the traditional approach

Unknown questions, the big data approach

What information consumes is rather obvious: it consumes the attention of its

recipients. Hence a wealth of information creates a poverty of attention, and a

need to allocate that attention efficiently among the overabundance of

information sources that might consume it.

Herbert Simon

17

Item hierarchy (Amazon)

Attributes (Pandora)

Item similarity (Netflix)

User similarity (Walmart)

Social network (Linkedin)

Model based (HPC challenges and needs)

18

Clustering (grouping similar items)

Hidden Markov models (HMMs)

Blind signal separation/feature extraction for dimensionality reduction

Artificial neural networks (ANNs)

19

Materials

Astronomy

Sensing

Network

science

IoT

Bioscience

Environment

Social

media

Scheduling

Monitoring &

adaptationVMs, containers &

stacksStorage & file

systems Parallel algorithms

Libraries & tools

DSLs

Multicore/GPUs Networks & QoS

Data Analytics

Ecosystem

HPC

EcosystemSpark

Mahout/MLlib

R Python

Storm

PIG

HBase

Zookeeper

MPI

LustreSLURM

OpenMP

PAPI

OpenCL

Graphs StreamingDeep

learning

Core machine

learning

20

21

Discussion

top related