today ’ s emerging platforms, tomorrow ’ s cyber infrastructure
DESCRIPTION
Today ’ s Emerging Platforms, Tomorrow ’ s Cyber Infrastructure. OBAMA ADMINISTRATION UNVEILS $200 Million “ BIG DATA ” INITIATIVE (March 2012) - PowerPoint PPT PresentationTRANSCRIPT
1
Today’s Emerging Platforms,Tomorrow’s Cyber Infrastructure
Digging Deeper, Seeing Farther: Supercomputers Alter Science
J.Markoff, NY Times 2011
The country that wants to out-compete must out-compute. HPC is an innovation accelerator… shrinks “time-to-insight” and “time-to-solution”
Suzy TichenorCouncil on Competitiveness
The country that wants to out-compete must out-compute. HPC is an innovation accelerator… shrinks “time-to-insight” and “time-to-solution”
Suzy TichenorCouncil on Competitiveness
OBAMA ADMINISTRATION UNVEILS $200 Million “BIG DATA” INITIATIVE (March 2012) “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.”
OBAMA ADMINISTRATION UNVEILS $200 Million “BIG DATA” INITIATIVE (March 2012) “By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help solve some the Nation’s most pressing challenges.”
2
Future Computing Platforms
Future computing platforms (FCP) include– accelerators like Field
Programmable Gate Arrays (FPGAs) and General Purpose Graphics Processing Units (GPGPUs),
– multi -core and -threaded processors
– Cloud computing platforms
FCP provide better– energy efficiency – performance– accessibility
Progress in Algorithms Beats Moore’s Law:“performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.”Dec 2010 REPORT TO THE PRESIDENT AND CONGRESS by the President’s Council of Advisors on Science and Technology
Progress in Algorithms Beats Moore’s Law:“performance gains due to improvements in algorithms have vastly exceeded even the dramatic performance gains due to increased processor speed.”Dec 2010 REPORT TO THE PRESIDENT AND CONGRESS by the President’s Council of Advisors on Science and Technology
3
Speed and power not keeping pace
Limitations of a single processor
Microprocessor Power Density Growth
If scaling continues at present (2001) pace,
by 2005, high speed processors would have power density of nuclear reactorby 2010, a rocket nozzleby 2015, surface of sun
“Business as usual will not work in the future.” Intel VP Patrick Gelsinger (ISSCC 2001)
4
Multi-core Processors
Blaise Barney, Lawrence Livermore National Laboratory
A problem is broken into discrete parts that can be solved concurrently
Each part is further broken down to a series of instructions
Instructions from each part execute simultaneously on different CPUs
5
Graphics Processing Unit (GPU) Co-processors
• GPUs are going beyond games and into general high throughput computing
• GPUs are low-cost accelerators tuned to highly data-parallel fine-grained tasks
• GPUs require rewriting code and may perform poorly on very irregular tasks or tasks with large sequential parts
• GPUs are speeding up a variety of image, biological sequence, graph, and string processing computations
6
FPGA Abstraction
• FPGA consists of– Matrix of programmable logic cells
• Implement any logic function– AND, OR, NOT, etc
• Can also implement storage– Registers or small SRAMs
• Groups of cells make up higherlevel structures
– Adders, multipliers, etc.
– Programmable interconnects• Connect the logic cells to one another
– Embedded features• ASICs within the FPGA fabric for specific functions
– Hardware multipliers, dedicated memory, microprocessors
• FPGAs are SRAM-based– Configure device by writing to configuration memory
Logic CellsInterconnects
7
FPGA Device Trend
*Logic cell = LUT + FF + connection to adjacent cells
Lo
gic
Ce
lls*
1998 2000 2001 200219991997
7.5K 0.18-0.22µ
1.8-2.5V core
+ Embedded RAMS
20K
50K
100K
2.5-3.3V core
XC4000
0.25-0.35µ
Configurable Logic Blocks
0.13µ
1.5V core
+ Embedded multipliers
1.5V core
+ PowerPC cores
0.13µ
2005
1.2V core
90 nm
+ Integrated MAC + 500 MHz200K
65 nm
+ 6-input LUT+ 550 MHz
2006
250K 1.0V core
40 nm
1.0V core
+ 600 MHz
28 nm
1.0V core
+ SSI Technology+ ~70 Mbit on-chip
2M
2009 2011
8
Example: Virtex 7 capabilities
• On Chip BRAM bandwidth– 1292 x 36 Kbit BRAM blocks– BRAM bandwidth: 1292 x 36 x 500 MHz = ~23 Tbps
• Distributed memory bandwidth– Total of 21 Mbits– For a 1K x 36-bit configuration ~600 blocks– Dist. memory bandwidth: 600 x 36 x 500 MHz = ~11 Tbps
• Logic operation capabilities– Total possible inputs: 1220K x 6 inputs/ 6 = 1220K– Logic operations: 1220K x 500 MHz = 610 GOPs/s
• Input/Output pins– Number of I/O pins: 1200– Total I/O bandwidth: 1200 x 500 MHz = 600 Gbps
9
Cloud ComputingManageability » On-demand resources » “Pay as you go” »
Simple services » Reduce time to science
9
Traditional Software
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Apps
Infrastructure(as a Service)
Storage
Servers
Networking
O/S
Middleware
Virtualization
Data
Apps
Managed b
y
vendor
Platform(as a Service)
Managed b
y
vendor
Storage
Servers
Networking
O/S
Middleware
Virtualization
Apps
Data
Scientific Software(as a Service) M
anaged b
y
vendor
Storage
Servers
Networking
O/S
Middleware
Virtualization
Apps
Data
Windows Azure Training Kit - January Refresh
Windows Azure Training Kit - January Refresh
1010
Cloud Computing
• Data & compute collocated. Ease of data sharing & collaboration.
• Well suited for loosely coupled distributed applications on 100s of VMs
• Data parallel frameworks (e.g. Hadoop, Pregel, Workflows)
• Science applications: genome sequencing pipelines, graph analytics,
11
FCP Visiondata-intensive
grand challenge
problemsaccelerated applications and analyses
Center for Sustainable Software on Future Computing Platforms
FPGAGPGPU
Cloud
Multi-core
Graph Algorithms, Tools, Libraries, Frameworks
Graphs are pervasive in large-scale data analysis
• Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications.
• New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality.
Cybersecurity Problem: Detecting anomalies and bad actorsChallenges: scale, real-timeGraph problems: belief propagation, path and community analysis
BioinformaticsProblem: Genome and haplotype assemblyChallenges: data qualityGraph problems: Eulerian paths, MaxCut
Social InformaticsProblem: Discover emergent communities, model spread of information.Challenges: new analytics routines, uncertainty in data.Graph problems: clustering, shortest paths, flows.
Image sources: (1) Mihai Pop, Carl Kingsford www.cbcb.umd.edu/ (2) Chau et al. In SIAM Data Mining (2011) (3) www.visualComplexity.com 12
13
Next Generation Sequencing (NGS) enabled applications
High-throughput DNA sequencing, or Next Generation Sequencing (NGS) technologies are used in a variety of applications – resequencing genome mapping– de novo sequencing genome assembly– gene expression analysis transcriptome assembly– metagenomic sampling metagenomic clustering and/or
assembly
In each case, longer original sequences must be recovered from the ~100 bp fragments produced by NGS
14
Example NGS-enabled Application: Genome Assembly
Nature Biotechnology 29, 987–991 (2011)
15
Example NGS Enabled Application: Haplotype Assembly
• Human genome contains a pair of DNA sequences : one from each parent called haploid sequences or haplotypes– Haplotypes differ at SNP positions– SNPs (single nucleotide polymorphism) are single basepair
mutations (~0.1%; non-uniform)– SNP positions are “well known”, and contain one of two possible
alleles
• Haplotypes useful for disease prediction, organ transplant, drug response…next-gen sequencing.
• Goal of phasing (haplotype assembly) is to use aligned sequence fragments to reconstruct the two haplotypes
16
Haplotype Assembly from Next-Gen Sequencing (NGS) Data
From: An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346
17
Haplotype assembly algorithm from UCSD
• Construct consensus sequence consistent with NGS reads
• Convert consensus & reads to binary– Assume consensus is a haplotype. Map to binary.– Convert sequence reads to binary
• Convert consensus & reads to graph– SNP positions are nodes– Edges between nodes in the same read– Edge weight = Σ (edges inconsistent with haplotype or compliment) minus Σ ( edge consistent with haplotype or compliment)
18
Note: Switch signs of edge weights
An MCMC algorithm for haplotype assembly from whole-genome sequence data, V. Bansal, et al, Genome Res., 2008 18: 1336-1346
Pos.
Map Reads to Graph
19
Phasing using HapCUT
• Phasing is done by finding the graph MaxCut– Partition graph into two such that edge weights
between the two partitions is max
• MaxCut determines the reads with the “weakest” link to haplotype
• Flip the binary digit of the haplotype at the MaxCut edges– Nudging the haplotype to be consistent with one of the
partitions & hence the reads
• Rebuild graph and repeatHapCUT: an efficient and accurate algorithm for the haplotype assembly problemBansal and Bafna, Bioinformatics (2008), 24 (16): 153-159
20
Graph analytics for secure and trustworthy computing
Monitoring and securing cyberspace involves many challenging connectivity-analysis problems– Identifying bad actors in a social network → community
detection, betweenness centrality– Detecting malware → graph analytics + machine learning– Diagnosing connectivity problems in a computer network → path
analysis– Modeling influence propagation → graph diffusion
These problems may all involve “internet scale” and beyond — e.g., many billions of entities, trillions of relational edges
A specific growth area is combining graph analytics and machine learning
21
Example: Malware detection
The Polonium system combines graph analysis and machine learning to detect malware.→ Chau et al. “Polonium: Tera-scale graph mining and inference for malware detection.” In SIAM Data Mining (2011).
Driving Forces in Social Network Analysis
• Facebook has more than 1 billion active users
• Note the graph is changing as well as growing.• What are this graph's properties? How do they change?• Traditional graph partitioning often fails:
– Topology: Interaction graph is low-diameter, and has no good separators– Irregularity: Communities are not uniform in size– Overlap: individuals are members of one or more communities
• Sample queries: – Allegiance switching: identify entities that switch communities.– Community structure: identify the genesis and dissipation of communities– Phase change: identify significant change in the network structure
3 orders of magnitude growth in 3 years!
22
Image Source: Nexus (Facebook application)
Graph –theoretic problems in social networks
– Community identification: clustering– Targeted advertising: centrality– Information spreading: modeling
23
Graph Analytics for Social Networks
• Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. …
• Communities may overlap, exhibit different properties and sizes, and be driven by different models
– Detect communities (static or emerging)– Identify important individuals– Detect anomalous behavior– Given a community, find a representative member of the
community– Given a set of individuals, find the best community that includes
them
24
Characterizing Graph-theoretic computations
• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics
• graph sparsity (m/n ratio)• static/dynamic nature• weighted/unweighted, weight distribution• vertex degree distribution• directed/undirected• simple/multi/hyper graph• problem size• granularity of computation at nodes/edges• domain-specific characteristics
• paths• clusters• partitions• matchings• patterns• orderings
• paths• clusters• partitions• matchings• patterns• orderings
Input: Graph abstraction
Problem: Find ***
Factors that influence choice of algorithmGraph
algorithms
• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..
• traversal• shortest path algorithms• flow algorithms• spanning tree algorithms• topological sort …..
Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations
Graph problems are often recast as sparse linear algebra (e.g., partitioning) or linear programming (e.g., matching) computations
25
Massive data analytics in Informatics networks
• Graphs arising in Informatics are very different from topologies in scientific computing.
• We need new data representations and parallel algorithms that exploit topological characteristics of informatics networks.
Emerging applications: dynamic, high-dimensional data
Static networks, Euclidean topologies
26
What we’d like to infer from Information networks
• What are the degree distributions, clustering coefficients, diameters, etc.?
– Heavy-tailed, small-world, expander, geometry+rewiring, local-global decompositions, ...
• How do networks grow, evolve, respond to perturbations, etc.?– Preferential attachment, copying, HOT, shrinking diameters, ..
• Are there natural clusters, communities, partitions, etc.?– Concept-based clusters, link-based clusters, density-based clusters, ...
• How do dynamic processes – search, diffusion, etc. – behave on networks?
– Decentralized search, undirected diffusion, cascading epidemics, ...
• How best to do learning, e.g., classification, regression, ranking, etc.?
– Information retrieval, machine learning, ...
Slide credit: Michael Mahoney, Stanford 27
28
Graph Libraries
Graphs offer a natural representation for unstructured data from a variety of application areas such as biology, social informatics, and security.
Queries on these graphs are often challenging due to – the response time needed (cyber security)– ingestion of massive volumes of data (high throughput
genome sequencing)– and dynamic updates to the graph (online social networks)
Common categories of graph problems– traversal (eg. breadth first search)– optimization (eg. shortest paths)– detection problems (eg. clustering, centrality)
29
Potential Role of FCP Center
Bring together– leaders in the graph algorithms community (from academia,
industry, national labs, international labs, and government) – liaisons from tools developers in domain sciences
to develop– API for graph algorithms– graph algorithm libraries for accelerators– reference implementations– open source frameworks others can optimize and plug into– standards, benchmarking, and best practices
30
• Obtain a comprehensive understanding of – the common graph problems in data-intensive
grand challenges that best map to future computing platforms
– the infrastructure needed to support development of critical scientific software on these platforms
• Develop reference implementations, open source frameworks, best practices, etc. that enable a few grand challenge problems
• Prioritize appropriate research, development and outreach activities of the FCP Center.
Conceptualization Goals