today ’ s emerging platforms, tomorrow ’ s cyber infrastructure

1

Today’s Emerging Platforms,Tomorrow’s Cyber Infrastructure

Digging Deeper, Seeing Farther: Supercomputers Alter Science

J.Markoff, NY Times 2011

The country that wants to out-compete must out-compute. HPC is an innovation accelerator… shrinks “time-to-insight” and “time-to-solution”

Suzy TichenorCouncil on Competitiveness

The country that wants to out-compete must out-compute. HPC is an innovation accelerator… shrinks “time-to-insight” and “time-to-solution”

Suzy TichenorCouncil on Competitiveness

OBAMA ADMINISTRATION UNVEILS $200 Million “BIG DATA” INITIATIVE (March 2012) “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.”

OBAMA ADMINISTRATION UNVEILS $200 Million “BIG DATA” INITIATIVE (March 2012) “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.”

2

Future Computing Platforms

Future computing platforms (FCP) include– accelerators like Field

Programmable Gate Arrays (FPGAs) and General Purpose Graphics Processing Units (GPGPUs),

– multi -core and -threaded processors

– Cloud computing platforms

FCP provide better– energy efficiency – performance– accessibility

Progress in Algorithms Beats Moore’s Law:“performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.”Dec 2010 REPORT TO THE PRESIDENT AND CONGRESS by the President’s Council of Advisors on Science and Technology

Progress in Algorithms Beats Moore’s Law:“performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.”Dec 2010 REPORT TO THE PRESIDENT AND CONGRESS by the President’s Council of Advisors on Science and Technology

3

Speed and power not keeping pace

Limitations of a single processor

Microprocessor Power Density Growth

If scaling continues at present (2001) pace,

by 2005, high speed processors would have power density of nuclear reactorby 2010, a rocket nozzleby 2015, surface of sun

“Business as usual will not work in the future.” Intel VP Patrick Gelsinger (ISSCC 2001)

4

Multi-core Processors

Blaise Barney, Lawrence Livermore National Laboratory

A problem is broken into discrete parts that can be solved concurrently

Each part is further broken down to a series of instructions

Instructions from each part execute simultaneously on different CPUs

5

Graphics Processing Unit (GPU) Co-processors

• GPUs are going beyond games and into general high throughput computing

• GPUs are low-cost accelerators tuned to highly data-parallel fine-grained tasks

• GPUs require rewriting code and may perform poorly on very irregular tasks or tasks with large sequential parts

• GPUs are speeding up a variety of image, biological sequence, graph, and string processing computations

6

FPGA Abstraction

• FPGA consists of– Matrix of programmable logic cells

• Implement any logic function– AND, OR, NOT, etc

• Can also implement storage– Registers or small SRAMs

• Groups of cells make up higherlevel structures

– Adders, multipliers, etc.

– Programmable interconnects• Connect the logic cells to one another

– Embedded features• ASICs within the FPGA fabric for specific functions

– Hardware multipliers, dedicated memory, microprocessors

• FPGAs are SRAM-based– Configure device by writing to configuration memory

Logic CellsInterconnects

7

FPGA Device Trend

*Logic cell = LUT + FF + connection to adjacent cells

Lo

gic

Ce

lls*

1998 2000 2001 200219991997

7.5K 0.18-0.22µ

1.8-2.5V core

+ Embedded RAMS

20K

50K

100K

2.5-3.3V core

XC4000

0.25-0.35µ

Configurable Logic Blocks

0.13µ

1.5V core

+ Embedded multipliers

1.5V core

+ PowerPC cores

0.13µ

2005

1.2V core

90 nm

+ Integrated MAC + 500 MHz200K

65 nm

+ 6-input LUT+ 550 MHz

2006

250K 1.0V core

40 nm

1.0V core

+ 600 MHz

28 nm

1.0V core

+ SSI Technology+ ~70 Mbit on-chip

2M

2009 2011

8

Example: Virtex 7 capabilities

• On Chip BRAM bandwidth– 1292 x 36 Kbit BRAM blocks– BRAM bandwidth: 1292 x 36 x 500 MHz = ~23 Tbps

• Distributed memory bandwidth– Total of 21 Mbits– For a 1K x 36-bit configuration ~600 blocks– Dist. memory bandwidth: 600 x 36 x 500 MHz = ~11 Tbps

• Logic operation capabilities– Total possible inputs: 1220K x 6 inputs/ 6 = 1220K– Logic operations: 1220K x 500 MHz = 610 GOPs/s

• Input/Output pins– Number of I/O pins: 1200– Total I/O bandwidth: 1200 x 500 MHz = 600 Gbps

9

Cloud ComputingManageability » On-demand resources » “Pay as you go” »

Simple services » Reduce time to science

9

Traditional Software

Storage

Servers

Networking

O/S

Middleware

Virtualization

Data

Apps

Infrastructure(as a Service)

Storage

Servers

Networking

O/S

Middleware

Virtualization

Data

Apps

Managed b

y

vendor

Platform(as a Service)

Managed b

y

vendor

Storage

Servers

Networking

O/S

Middleware

Virtualization

Apps

Data

Scientific Software(as a Service) M

anaged b

y

vendor

Storage

Servers

Networking

O/S

Middleware

Virtualization

Apps

Data

Windows Azure Training Kit - January Refresh

Windows Azure Training Kit - January Refresh

1010

Cloud Computing

• Data & compute collocated. Ease of data sharing & collaboration.

• Well suited for loosely coupled distributed applications on 100s of VMs

• Data parallel frameworks (e.g. Hadoop, Pregel, Workflows)

• Science applications: genome sequencing pipelines, graph analytics,

11

FCP Visiondata-intensive

grand challenge

problemsaccelerated applications and analyses

Center for Sustainable Software on Future Computing Platforms

FPGAGPGPU

Cloud

Multi-core

Graph Algorithms, Tools, Libraries, Frameworks

Graphs are pervasive in large-scale data analysis

• Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications.

• New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.

Cybersecurity Problem: Detecting anomalies and bad actorsChallenges: scale, real-timeGraph problems: belief propagation, path and community analysis

BioinformaticsProblem: Genome and haplotype assemblyChallenges: data qualityGraph problems: Eulerian paths, MaxCut

Social InformaticsProblem: Discover emergent communities, model spread of information.Challenges: new analytics routines, uncertainty in data.Graph problems: clustering, shortest paths, flows.

Image sources: (1) Mihai Pop, Carl Kingsford www.cbcb.umd.edu/ (2) Chau et al. In SIAM Data Mining (2011) (3) www.visualComplexity.com 12

13

Next Generation Sequencing (NGS) enabled applications

High-throughput DNA sequencing, or Next Generation Sequencing (NGS) technologies are used in a variety of applications – resequencing genome mapping– de novo sequencing genome assembly– gene expression analysis transcriptome assembly– metagenomic sampling metagenomic clustering and/or

assembly

In each case, longer original sequences must be recovered from the ~100 bp fragments produced by NGS

14

Example NGS-enabled Application: Genome Assembly

Nature Biotechnology 29, 987–991 (2011)

15

Example NGS Enabled Application: Haplotype Assembly

• Human genome contains a pair of DNA sequences : one from each parent called haploid sequences or haplotypes– Haplotypes differ at SNP positions– SNPs (single nucleotide polymorphism) are single basepair

mutations (~0.1%; non-uniform)– SNP positions are “well known”, and contain one of two possible

alleles

• Haplotypes useful for disease prediction, organ transplant, drug response…next-gen sequencing.

• Goal of phasing (haplotype assembly) is to use aligned sequence fragments to reconstruct the two haplotypes

16

Haplotype Assembly from Next-Gen Sequencing (NGS) Data

From: An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346

17

Haplotype assembly algorithm from UCSD

• Construct consensus sequence consistent with NGS reads

• Convert consensus & reads to binary– Assume consensus is a haplotype. Map to binary.– Convert sequence reads to binary

• Convert consensus & reads to graph– SNP positions are nodes– Edges between nodes in the same read– Edge weight = Σ (edges inconsistent with haplotype or compliment) minus Σ ( edge consistent with haplotype or compliment)

18

Note: Switch signs of edge weights

An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346

Pos.

Map Reads to Graph

19

Phasing using HapCUT

• Phasing is done by finding the graph MaxCut– Partition graph into two such that edge weights

between the two partitions is max

• MaxCut determines the reads with the “weakest” link to haplotype

• Flip the binary digit of the haplotype at the MaxCut edges– Nudging the haplotype to be consistent with one of the

partitions & hence the reads

• Rebuild graph and repeatHapCUT: an efficient and accurate algorithm for the haplotype assembly problemBansal and Bafna, Bioinformatics (2008), 24 (16): 153-159

20

Graph analytics for secure and trustworthy computing

Monitoring and securing cyberspace involves many challenging connectivity-analysis problems– Identifying bad actors in a social network → community

detection, betweenness centrality– Detecting malware → graph analytics + machine learning– Diagnosing connectivity problems in a computer network → path

analysis– Modeling influence propagation → graph diffusion

These problems may all involve “internet scale” and beyond — e.g., many billions of entities, trillions of relational edges

A specific growth area is combining graph analytics and machine learning

21

Example: Malware detection

The Polonium system combines graph analysis and machine learning to detect malware.→ Chau et al. “Polonium: Tera-scale graph mining and inference for malware detection.” In SIAM Data Mining (2011).

Driving Forces in Social Network Analysis

• Facebook has more than 1 billion active users

• Note the graph is changing as well as growing.• What are this graph's properties? How do they change?• Traditional graph partitioning often fails:

– Topology: Interaction graph is low-diameter, and has no good separators– Irregularity: Communities are not uniform in size– Overlap: individuals are members of one or more communities

• Sample queries: – Allegiance switching: identify entities that switch communities.– Community structure: identify the genesis and dissipation of communities– Phase change: identify significant change in the network structure

3 orders of magnitude growth in 3 years!

22

Image Source: Nexus (Facebook application)

Graph –theoretic problems in social networks

– Community identification: clustering– Targeted advertising: centrality– Information spreading: modeling

23

Graph Analytics for Social Networks

• Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. …

• Communities may overlap, exhibit different properties and sizes, and be driven by different models

– Detect communities (static or emerging)– Identify important individuals– Detect anomalous behavior– Given a community, find a representative member of the

community– Given a set of individuals, find the best community that includes

them

24

Characterizing Graph-theoretic computations

• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics

• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics

• paths• clusters• partitions• matchings• patterns• orderings

• paths• clusters• partitions• matchings• patterns• orderings

Input: Graph abstraction

Problem: Find ***

Factors that influence choice of algorithmGraph

algorithms

• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..

• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..

Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations

Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations

25

Massive data analytics in Informatics networks

• Graphs arising in Informatics are very different from topologies in scientific computing.

• We need new data representations and parallel algorithms that exploit topological characteristics of informatics networks.

Emerging applications: dynamic, high-dimensional data

Static networks, Euclidean topologies

26

What we’d like to infer from Information networks

• What are the degree distributions, clustering coefficients, diameters, etc.?

– Heavy-tailed, small-world, expander, geometry+rewiring, local-global decompositions, ...

• How do networks grow, evolve, respond to perturbations, etc.?– Preferential attachment, copying, HOT, shrinking diameters, ..

• Are there natural clusters, communities, partitions, etc.?– Concept-based clusters, link-based clusters, density-based clusters, ...

• How do dynamic processes – search, diffusion, etc. – behave on networks?

– Decentralized search, undirected diffusion, cascading epidemics, ...

• How best to do learning, e.g., classification, regression, ranking, etc.?

– Information retrieval, machine learning, ...

Slide credit: Michael Mahoney, Stanford 27

28

Graph Libraries

Graphs offer a natural representation for unstructured data from a variety of application areas such as biology, social informatics, and security.

Queries on these graphs are often challenging due to – the response time needed (cyber security)– ingestion of massive volumes of data (high throughput

genome sequencing)– and dynamic updates to the graph (online social networks)

Common categories of graph problems– traversal (eg. breadth first search)– optimization (eg. shortest paths)– detection problems (eg. clustering, centrality)

29

Potential Role of FCP Center

Bring together– leaders in the graph algorithms community (from academia,

industry, national labs, international labs, and government) – liaisons from tools developers in domain sciences

to develop– API for graph algorithms– graph algorithm libraries for accelerators– reference implementations– open source frameworks others can optimize and plug into– standards, benchmarking, and best practices

30

• Obtain a comprehensive understanding of – the common graph problems in data-intensive

grand challenges that best map to future computing platforms

– the infrastructure needed to support development of critical scientific software on these platforms

• Develop reference implementations, open source frameworks, best practices, etc. that enable a few grand challenge problems

• Prioritize appropriate research, development and outreach activities of the FCP Center.

Conceptualization Goals

today ’ s emerging platforms, tomorrow ’ s cyber infrastructure

Documents

configuration memory

logic functionand

dedicated memory

fpga abstractionfpga

fpga fabric

fpga device trend

adjacent cells

high speed processors