high performance computational biology and drug design...
TRANSCRIPT
National University of Defense Technology
TianHe
天河
High Performance Computational Biology and Drug Design on
Tianhe SupercomputersSchool of Computer Science
National University of Defense Technology (NUDT) Presenter: Shaoliang Peng
Email: [email protected]
National University of Defense Technology
TianHe
天河 About us
l School of Computer Science of NUDT – The largest School of Computer Science:
l 10 institutes, 400 +faculties, and 3,000+ students
l Hometown of Supercomputers in China: Tianhe1 and 2 Supercomputers
– No. 1 in TOP500 (2010.10, 2013.6, 2013.11, 2014.6, 2014.11, 2015.7, 2015.11) – TH-2: 33.86 PFLOPS, 32,000 CPUs+48,000 MICs
2
National University of Defense Technology
TianHe
天河 About me- Dr. Shaoliang Peng l National University of Defense Technology (NUDT,
Changsha, China) and an adjunct professor of BGI. l High performance computing, bioinformatics, virtual
screening, and biology simulation. l We gains the Gold Award twice of PAC 2014 and
2015 (Parallel Application Challenge Competition) and IEEE Scale Chanllenge Finallist Award Ø Human Whole Genome Re-sequencing Analysis Software
Pipeline, Ø mD3DOCKxb: largest high throughput molecular docking
platform
3
National University of Defense Technology
TianHe
天河 Outline
l Overview of TianHe Supercomputers (1, 2 & 3)
l Applications on TianHe
l Bio-medical applications on TianHe
l Summary
National University of Defense Technology
TianHe
天河 Overview of TH-1
l Hybrid architecture: CPU & GPU
l Custom system software stack
Items Configuration Processors 14336 Intel CPUs + 7168 nVIDIA GPUs + 2048 FT CPUs
Peak performance 4.7PF, Linpack 2.57PF Interconnect Proprietary high-speed interconnection network TH-net
Memory 262TB in total Storage Global shared parallel storage system, 2PB Cabinets 140 compute / communication/storage Cabinets
Power consumption 4.04MW (635.15MF/W) Cooling Water cooling system
National University of Defense Technology
TianHe
天河
l Neo-heterogeneous architecture – Xeon CPU & Xeon Phi
Items Configuration Processors 32000 Intel Xeon CPUs + 48000 Xeon Phis + 4096 FT
CPUs Peak performance is 54.9PFlops
Interconnect Proprietary high-speed interconnection network TH Express-2
Memory 2 PB in total Storage Global shared parallel storage system, 12.4PB Cabinets 125+13+24=162 compute/communication/storage Cabinets
Power 17.8 MW Cooling Closed Air cooling system
Overview of TH-2
National University of Defense Technology
TianHe
天河 Roadmap of Tianhe System �
System Tianhe-‐1A Tianhe-‐2 Tianhe-‐2A
System Peak(PF) 4.7 54.9 ~100
Peak Power(MW) 4.04 17.6 ~18
Total System Memory 262 TB 1.4 PB ~3PB
Node Performance(TF) 0.655 3.431 ~6
Node processors Xeon X5670 Nvidia M2050
Xeon E5 2692 Xeon Phi China CPU + GPDSP
System size(nodes) 7,168 nodes 16,000 nodes ~18,000
System Interconnect TH Express-‐1 TH Express-‐2 TH Express-‐2+
File System 2 PB Lustre
12.4PB H2FS+Lustre
~30PB H2FS+TDM
National University of Defense Technology
TianHe
天河 Application scale in next 5 yearsApplica8ons Current Scale in China Scale in next 5
years
Seismic Explora8on 2600km2 , 5km depth
217900 shots2.2TB data
Millions of shots
Genomics Research 2PB bioinforma[cs data 100PB bio data
New Energy(Magne[c Confinement Fusion)
2 billion ions0.83 billion electrons 100 billion atoms
Drug Design 200-‐300ns Molecular Dynamics simula[ons
10 Million molecular1000ns/day
CFD (Aircra` Design) 3.5 billion mesh points 100 billion mesh
points
Universal Evolu8on(neutrinos) 110 billion par[cles Trillion par[cles
Smart City (Urban Electromagne[c Spectrum Monitoring System)
Area (Guangzhou city):200km2
Grid size:1.0km*1.0km
Grid Size: 100m*100m
National University of Defense Technology
TianHe
天河
System Architecture
Hybrid Runtime
MPI
Domain Framework Data Management Tools
Hardware
Software
Application Domain Models
Proxy Apps
Algorithms Benchmarks
OS Compiler Library File System
OpenMP GA CUDA /OpenAcc Spark New Emerged
Programing Interface
Data Analysis
CPU/Accelerator Hybrid Node Memory Interconnection Storage Device
Solutions
Requirements
Constraints
Tradeoff
Bri
dge
Co-design Eco-system
National University of Defense Technology
TianHe
天河
l National University of Defense Technology (NUDT), Changsha
Application areas
NUDT NSCC-CS Changsha, NSCC-TJ Tianjin, NSCC-GZ Guangzhou
National University of Defense Technology
TianHe
天河 Resources and Users on TH Supercomputer
——NSCC-TJ TH-1 (Nov.2010 – May. 2011)
NSCC-GZ TH-2: Bio-medical Users > 30%
National University of Defense Technology
TianHe
天河 Bio-medical Big Data Needs Big Computer
Extremely powerful computers are needed to help biologists to handle big-data traffic jams.
Nature 498, 255–260(13 June 2013), Biology: The big challenges of big data
Decreasing trend of the cost of DNA sequencing. (http://www.genome.gov/sequencingcosts/) The growing velocity of biological big data is way beyond Moore's Law of compute power growth.
National University of Defense Technology
TianHe
天河 Solving 3 Bio-medical Big Data Problems using TH
3 Kinds of Bio-medical Big Data Problems l Computation-Intensive
– Large-scale sequence alignment/assembly – Virtual drug screening
l Data-Intensive ü Large Memory (2nd-3rd Denovo Genome Assembly ) ü Intensive I/O (NGS Genome Data and Text Mining)
l Communication intensive – Bio-Network (Gene networks, Protein Interactions… )
ü Design Characteristics of TH-2 – 32000 CPUs + 48000 MIC (Neo-hetergeneous Architecture)
– 1.4 PB MEM+ 〉12.4 PB Storage (Big and Fast)
– Proprietary high-speed interconnection network
National University of Defense Technology
TianHe
天河 Bio-Software developed on Tianhe
l SOAP denovo2 (TH-1 and TH-2)
l SOAP3-dp & MICA (TH-1 and TH-2)
l mBWA (TH-2) �
l mSOAPsnp (TH-2)
l SOAPfuse (TH-2)
l GAMA (TH-1)
l SGA (TH-1) … …
18
National University of Defense Technology
TianHe
天河 Bio-applications on TianHe
l MPI-SGA: String graph based de novo assembly l GAMA: high-precision population genotype
analysis software l ABYSS: a de novo, parallel, paired-end sequence
assembler that is designed for short reads l ParMETIS: an MPI-based parallel library that
implements a variety of algorithms for Graph Partitioning, Mesh Partitioning, and Matrix Reordering.
l … …
16/11/7 TH-1 19
National University of Defense Technology
TianHe
天河 Deep Parallelized Optimization of Genome Big Data Analysis Software Pipeline
l Applications:Clinical studies (cancer, Ebola, SARS),population genetics, evolutionary analysis, etc.
l Challenge: 2,000human 30X deep sequence data (300TB)
20
r Aim:Finding personalised genomic varia2ons (SNP, CNV, Indel, etc. ) ASAP.
National University of Defense Technology
TianHe
天河 Genome Big Data Analysis Software Pipeline on TH-2
21
http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism
National University of Defense Technology
TianHe
天河 MICA: Parallel short sequence alignment
(large scale approximate string matching) l MICA is an optimized version of SOAP3-dp
implemented to be accelerated by MIC on TH-2 l Optimization efforts:
– Three-channel IO Latency Concealing– Introduce 512 Bit SIMD code – Parallelized Construction the Smith-Waterman Matrix
– Prefetching of Index Data – Using inline function calls
National University of Defense Technology
TianHe
天河 mSOAPsnp: Massively parallel SNP detection
Core: Bayesian Probability Computationl Algorithm improvements
Ø Compression of 4-D Sparse Matrix
Ø Elimination of Computation Redundancy via Building a Fast
Table
Ø Consistency Sorting of the Gradient
l MIC specific optimizations:
Ø Loop expansion, space padding to improve data spatial locality, SIMD
code
l CPU+MIC collaborated computation
23
National University of Defense Technology
TianHe
天河 Large scale deployment on TH-2
!Ø Data:2,000human 30X deep sequence data, 300TB in total Ø Scale:8,192 nodes (each with 2 CPU + 3 MIC) Ø Processing Time: 8 months to 8.37 hours (700X speedup ) • mSOAPsnp scales up to 8,192 nodes (196,608 CPU cores and 1,376,256 MIC cores, Parallel efficiency > 60.7%, Published in ISC 2015)
ü MICA: http://sourceforge.net/projects/mica-aligner ü mSOAPsnp: http://sourceforge.net/projects/msnp
National University of Defense Technology
TianHe
天河 Drug Design on Tianhe supercomputers
3 software used most 1.A CPU/MIC Collaborated Parallel Framework for GROMACS on TH-2 (GIW 2016) 2. mAMBER: A CPU/MIC Collaborated Parallel Framework for AMBER on TH-2 (BIBM 2016) 3. mD3DOCKxb: An Ultra-Scalable CPU-MIC Coordinated Virtual Screening Framework
National University of Defense Technology
TianHe
天河
A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2
Supercomputer
Wenhe Su, Shaoliang PENG, Shunyun Yang, Xiaoyu Zhang, Tenglilang Zhang, Weiguo Liu, Xingming Zhao
Supported by: NSFC Grant 61272056, U1435222, and 1133005
School of Computer Science National University of Defense Technology
Changsha, China
The 27th International Conference on Genome Informatics 2016 Shanghai, China
National University of Defense Technology
TianHe
天河
mAMBER: A CPU/MIC Collaborated Parallel Framework for AMBER on
Tianhe-2 Supercomputer
Shaoliang Peng, Xiaoyu Zhang, Yutong Lu, Xiangke Liao,Weiliang Zhu, Dongqing Wei
School of Computer Science
National University of Defense Technology Changsha, China
IEEE BIBM 2016 Shenzhen, China
National University of Defense Technology
TianHe
天河
l High Throughput Virtual Screening l Applications: Computer Aided Drug Design, Molecular
Docking, and Virtual screening l Challenges: Sudden illness and unknown virus appear,
screening as many molecules as possible to find the effective ones, but there are more than 35 millions of drug molecules on earth.
Aim: Finish docking of all the drug molecules on earth within one day.
Find out the best 100 molecules;
Do experiment;
Clinical
High Throughput Virtual Screening
National University of Defense Technology
TianHe
天河
l mD3DOCKxb – Lamarckian Genetic Algorithm
l Data Scale: 40 millions molecules, 800TB,20*40 millions small files l Parallel mode: MPI + OpenMP; l Bottleneck: IO bandwidth, Communication bandwidth l Accelerator: MIC (offload mode) and CPU l Components:
– Communication Engine: l Multi layers control: Task partitioning, Load balance l Divide tasks into two batches: prevent repeated calculation l Sleep by groups: avoid too many IO and message passing in an instant
– Execute Engine: Judge whether a job run on CPU or MIC – Collaborated Accelerator: Multithreading is implemented both on CPU
and MIC which handle tasks independently and collaborated accelerate the software
Multi levels parallel and high throughput molecular docking software with MIC+CPU collaborated
National University of Defense Technology
TianHe
天河 mD3DOCKxbr 2 CPUs + 3 MICs, offload mode r Massive small files access(40millions)
Ø CongesOon control Ø MulO-‐stage task schedule Ø Task pool management Ø Asynchronous i/o & comm
r Improving i/o performance 10x (>800 thousands hybrid cores)
r Reducing MPI lantency from one hour to several seconds
National University of Defense Technology
TianHe
天河 42 millions real drug molecules docking against Ebora virus in one day ON TH-2
l We finished 42 millions dockings from 500 to 8000 nodes on TH-2. The parallel efficiency of mD3DOCkxb is over 84%.
42 millions drug molecules are all the known of the earth, so finish docking against unknown virus within one day is possible!
National University of Defense Technology
TianHe
天河 Data compression using GPUl For mainstream biological data storage format, improve the
compression efficiency. l The test results show that the with column-major block
compression method can improve compression efficiency
32
0
20
40
60
80
100
120
gzip bzip2 以列为主分块压缩
FASTQ format
压缩速度(MB/S) 。
GPU Accelerated Adaptive Compression Framework for Genomics Data, Guixin Guo, Shuang Qiu, Bingqiang Wang, Mian Lu, Simon See, IEEE BigData’13
National University of Defense Technology
TianHe
天河
Ø Mass data:rapid development for Genome sequencing technology, data is accumulated as exponent speed
Ø Hyper-scale computational requirement:computational scale become more complicated and bring challenge to architecture
n Bioinformatics and Computational Biology will be the main application domains for supercomputing
Ø Analysis about difference and relevance for Population Genomics ü TB data , even PB data ü Complicated computational model ü Data intensive computing, require high performance
from the storage, communication and other subsystem.
Biology Sequence big data analysis I
National University of Defense Technology
TianHe
天河
Ø “TH-2” platform is able to provide the higher efficiency and higher precision solution for Bioinformatics. ü High-speed storage system solve the mass data input/
output problem of Bioinformatics analysis ü Heterogeneous computation could complete the
complicated computational model ü High-speed communication network gets rid of the
scalability problem of parallel computation Ø Application achievement based-on “TH”
ü Design and Complete the high-resolution analysis software of Population Genomics
ü Establish the software environment for GPU speed-up Bioinformatics
n The “BT+IT” application model will lead the development of Bioinformatics in the future
Biology Sequence big data analysis II
National University of Defense Technology
TianHe
天河 Drug Design on TH
l Virtual Drug Screening – use of computational resources
to more effectively and efficiently find compounds that may act as drugs.
– computational technique used in drug discovery to search libraries of small molecules in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.
National University of Defense Technology
TianHe
天河 Application of this software in drug discovery
l Prediction– Docked more than 300,000 drugs/
natural products/commercial compounds against 1,100 drug targets.
l Experimental Validation– Evaluated more than 600 drugs/natural
products/commercial compounds in vitro and in vivo; yielded 513 active compounds in vitro and 7 active compounds in vivo.
l Significance – Provides lead compounds for cancer,
HBV, and diabetes etc.
National University of Defense Technology
TianHe
天河 Paper List1. Fang X, ..., Xiangke Liao, Xiaoqian Zhu, Shaoliang Peng, et al. Genome-wide adaptive complexes
to underground stresses in blind mole rats, Spalax: adaptive complexes to stressful life underground. Nature Communication
2. Luo R, Heng Wang..., Xiaoqian Zhu, Shaoliang Peng, et al.MICA: A fast short-read aligner that takes full advantage of Intel Many Integrated Core Architecture (MIC). BMC Bioinformatics.
3. Jia W,... Xiangke Liao, Shaoliang Peng, et al. SOAPfuse: an algorithm for identifying fusion transcripts from paired-end RN. et A-Seq data[J]. Genome biology, 2013, 14(2): R12.
4. Luo R, ..., Xiaoqian Zhu, Shaoliang Peng, et al. SOAP3-dp: Fast, Accurate and Sensitive GPU-Based Short Read Aligner[J]. PloS one, 2013, 8(5): e65632.
5. Wang J, Peng S, Cossins B P, Xiaoqian Zhu, et al. Mapping Central α-Helix Linker Mediated Conformational Transition Pathway of Calmodulin via Simple Computational Approach[J]. The Journal of Physical Chemistry B, 2014, 118(32): 9677-9685.
6. Luo R, ..., Xiangke Liao, Xiaoqian Zhu, Shaoliang Peng, et. al. SOAPdenovo2: an empirically improved memory-efficient short-read denovo assembler [J]. GigaScience, 2012, 1(1): 18.
7. Feng Zhang, Xiangke Liao, Shaoliang Peng, Bingqiang Wang, Xiaoqian Zhu. MPISGA: A Program for Speeding up String Graph Based Assembly on Tianhe Supercomputer. ICG-7 & BioIT 2012, Hong Kong, 2012
8. Yingbo Cui, Xiangke Liao, Shaoliang Peng, mBWA: a Massively Parallel Sequence Reads Aligner, PACBB 2014, Spain.
38
National University of Defense Technology
TianHe
天河 Patents, Software Copyright, and AwardsPatents l A three-stage pipeline based parallel alignment algorithm by CPU cooperating with MIC l Task model building method based on biological gene sequencing log l Strategy-based deployment method of computing tasks on virtual machine Software copyrights l Gene sequence assembly software based on string graph theory l High-throughput computing system for bioinformatics analysis V1.0 AwardsThe Scaling Genome Big Data Analysis Software on TH-2 Supercomputer, l The Eighth IEEE International Scalable Computing Challenge-SCALE 2015: Finalist Awards l Parallel Application Challenge 2014, Best Application Golden Award (1/85)
39
National University of Defense Technology
TianHe
天河 SUMARRY
l Computation intensive problems l Data intensive problems (big data …) l Network intensive problems
Tianhe 2 supercomputer have: l 32000 CPUs + 48000 MIC l 2 PB in total + 40 PB Storage l Proprietary high-speed interconnection network
40
Which Bio-applications moving to Tianhe supercomputer? (Running time is too long to tolerant)
National University of Defense Technology
TianHe
天河 SUMARRY
l SOAP denovo2, SOAP3-dp, SOAPfuse, PacBio,
GWAS: PERMORY-MPI, GAMA-GPU, MICA-
BWA, TH-Cloud Computing … …
41
Bio-applications on TH
Friends' friends are good friends …l TH series supercomputers are open to all friends
not only on life&bio research
l Big computer + big bio-medical data = big science
National University of Defense Technology
TianHe
天河
Thanks!
Welcome to visit us and use TianHe supercomputer! Email: [email protected]
16/11/7 42
TianHe