introduction to seqan, an open-source c++ template library
DESCRIPTION
SeqAn (www.seqan.de) is an open-source C++ template library (BSD license) that implements many efficient and generic data structures and algorithms for Next-Generation Sequencing (NGS) analysis. It contains gapped k-mer indices, enhanced suffix arrays (ESA) or an FM-index, as well algorithms for fast and accurate alignment or read mapping. Based on those data types and fast I/O routines, users can easily develop tools that are extremely efficient and easy to maintain. Besides multi-core, the research team at Freie Universität Berlin has started generic support for distinguished accelerators such as NVIDIA GPUs. Go through the slides to learn more. For your own BI development you can try GPUs for free here: www.Nvidia.com/GPUTestDriveTRANSCRIPT
Sign up for FREE GPU Test Drive on remotely hosted clusters
Develop your codes on latest GPUs today
Test Drive NVIDIA GPUs! Experience The Acceleration
www.nvidia.com/GPUTestDrive
Prof. Dr. Knut Reinert Algorithmische Bioinformatik, FB Mathematik und Informatik
Intro to SeqAn An Open-Source C++ template library for biological sequence analysis Knut Reinert, David Weese Freie Universität Berlin Berlin Institute for Computer Science
3
This talk
Why SeqAn?
SeqAn as SDK
Generic Parallelization
SeqAn concept/content
4 Nvidia Webinar, 22.10.2013
~ 15 years ago...
Data volume and cost: In 2000 the 3 billion base pairs of the human genome were sequenced for about 3 billion US$ Dollar 100 million bp per day
5 Nvidia Webinar, 22.10.2013
Sequencing today...
Within roughly ten years sequencing has become about 10 million times cheaper
Illumina HiSeq 100 Billion bps per DAY
6 Nvidia Webinar, 22.10.2013
Future of NGS data analysis
7 Nvidia Webinar, 22.10.2013
Software libraries bridge gap
Theoretical Considerations
Algorithm design
Prototype implementation
Maintainable tool
Analysis pipelines
Computer Scientists
Experimentalists
Algorithm libraries
RNA-Seq
ChIP-Seq
Structural variants Metagenomics abundance
Sequence assembly Cancer genomics
FM-index
Suffix arrays
Multicore
Hardware acceleration
K-mer filter
Fast I/O
Secondary memory
8 Nvidia Webinar, 22.10.2013
SeqAn Now SeqAn/SeqAn tools have been cited more
than 360 times Among the institutions are (omitting German institutes): Department of Genetics, Harvard Medical School, Boston, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, J. Craig Venter Institute, Rockville MD, USA, Department of Molecular Biology, Princeton University, Applied Mathematics Program, Yale University, New Haven, IBM T.J. Watson Research Center, Yorktown Heights, The Ohio State University, Columbus, University of Minnesota, Australian National University, Canberra, Department of Statistics, University of Oxford, Swedish University of Agricultural Sciences (SLU), Uppsala, Graduate School of Life Sciences, University of Cambridge, Broad Institute, Cambridge, USA, EMBL-EBI, University of California, University of Chicago, Iowa State University, Ames, The Pennsylvania State University, Peking University, Beijing University of Science and Technology of China, BGI-Shenzhen, China, Beijing Institute of Genomics……
Is under BSD license and hence free for academic AND commercial use.
9 Nvidia Webinar, 22.10.2013
SeqAn developers
0
2
4
6
8
10
12
14
16
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
External CSC BMBF DFG IMPRS FU
10 Nvidia Webinar, 22.10.2013
SeqAn main concepts
11 Nvidia Webinar, 22.10.2013
length(str)
Value<T>::Type
String<Subclass>
12 Nvidia Webinar, 22.10.2013
void swap(string & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
13 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
14 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { char help = str[1]; str[1] = str[0]; str[0] = help;
}
15 Nvidia Webinar, 22.10.2013
template <typename T> void swap(String<T> & str) { T help = str[1]; str[1] = str[0]; str[0] = help;
}
16 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { T::value_type help = str[1]; str[1] = str[0]; str[0] = help;
}
17 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
18 Nvidia Webinar, 22.10.2013
template <typename T> struct Value { typedef T Type; };
Metafunction
19 Nvidia Webinar, 22.10.2013
template <typename T> struct Value< String<T> > { typedef T Type; };
template <typename T> struct Value { typedef T Type; };
20 Nvidia Webinar, 22.10.2013
template <typename T> struct Value< String<T> > { typedef T Type; };
template <typename T> struct Value { typedef T Type; };
template < > struct Value< char * > { typedef char Type; };
21 Nvidia Webinar, 22.10.2013
template < > struct Value< char * > { typedef char Type; };
template <typename T> struct Value< String<T> > { typedef T Type; };
template < t_size N > struct Value< char [N] > { typedef char Type; };
22 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
23 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = str[1]; str[1] = str[0]; str[0] = help;
}
24 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help;
}
25 Nvidia Webinar, 22.10.2013
template <typename T> Value<T> & value( T & str, int i) { return str[i]; };
Shim Function
26 Nvidia Webinar, 22.10.2013
template <typename T> void swap(T & str) { Value<T>::Type help = value(str,1); value(str,1) = value(str,0); value(str,0) = help;
}
Generic Algorithm
27 Nvidia Webinar, 22.10.2013
SeqAn Content - SDK
28 Nvidia Webinar, 22.10.2013
SeqAn SDK Components - Tutorials
29 Nvidia Webinar, 22.10.2013
SeqAn SDK Components – Reference Manual
30 Nvidia Webinar, 22.10.2013
SeqAn SDK Components
CDash/CTest to automatically compile and test across platforms Review Board to ensure code quality Code coverage reports
31 Nvidia Webinar, 22.10.2013
SeqAn Content algorithms & data structures
32 Nvidia Webinar, 22.10.2013
Standard DP-Algorithms Global & Semi Global Alignments Local Alignments
Modified DP-Algorithms Split Breakpoint Detection Banded Chain Alignment
Unified Alignment Algorithms
For Example ...
Versatile & Extensible DP-Interface
33 Nvidia Webinar, 22.10.2013
Unified Alignment Algorithms For Example ...
Banded Smith-Waterman with Affine Gap Costs: DPBand<BandOn>(lowerDiag, upperDiag),
DPProfile<LocalAlignment<>, AffineGaps, TracebackOn<> >
Semi-Global Gotoh without Traceback: DPProfile<GlobalAlignment<FreeEndGaps<True, False, True, False> >,
AffineGaps, TracebackOff>
Split-Breakpoint Detection for Right Anchor: DPProfile<SplitAlignment<>, AffineGaps, TracebackOn<GapsRight> >
Needleman-Wunsch with Traceback: DPProfile<GlobalAlignment<>, LinearGaps, TracebackOn<> >
34 Nvidia Webinar, 22.10.2013
Support for Common File Formats Important file formats for HTS analysis
Sequences FASTA, FASTQ Indexed FASTA (FAI) for random access
Genomic Features GFF 2, GFF 3, GTF, BED
Read Mapping SAM, BAM (plus BAM indices)
Variants VCF
… or write your own parser
Tutorials and helper routines for writing your own parsers.
SequenceStream ss(“file.fa.gz”); while (!atEnd(ss)) { readRecord(id, seq, ss); cout << id << '\t' << seq << '\n'; }
BamStream bs(“file.bam”); while (!atEnd(bs)) { readRecord(record, bs); cout << record.qName << '\t' << record.pos << '\n’; }
35 Nvidia Webinar, 22.10.2013
Journaled Sequences
Store Multiple Genomes Save Storage Capacities
StringSet<TJournaled, Owner<JournalSet> > set;
setGlobalReference(set, refSeq);
appendValue(set, seq1);
join(set, idx, JoinConfig<>());
String<Dna, Journaled<Alloc<> > > ���
G1:
G2:
GN:
Ref:
���
36 Nvidia Webinar, 22.10.2013
Fragment Store (Multi) Read Alignments
Read alignments can be easily imported: … and accessed as a multiple alignment, e.g. for visualization:
std::ifstream file("ex1.sam"); read(file, store, Sam());
AlignedReadLayout layout; layoutAlignment(layout, store); printAlignment(svgFile, Raw(), layout, store, 1, 0, 150, 0, 36);
37 Nvidia Webinar, 22.10.2013
Unified Full-‐Text Indexing Framework Available Indices
All indices support multiple strings and external memory construction/usage.
Index<TSeq, IndexEsa<> > Index<StringSet<TSeq>, FMIndex<> >
Suffix Trees: • suffix array • enhanced suffix array • lazy suffix tree
Prefix Trie: • FM-index
q-Gram Indices: • direct addressing • open addressing • gapped
All indices support the (sequential) find interface:
Finder<TIndex> finder(index); while (find(finder, "TATAA")) cout << "Hit at position" << position(finder) << endl;
Index Lookup Interface
38 Nvidia Webinar, 22.10.2013
SeqAn Performance
39 Nvidia Webinar, 22.10.2013
Masai read mapper
40 Nvidia Webinar, 22.10.2013
Algorithm is based on the simultaneous traversal of two string indices (e.g., FM-‐index, Enhanced suffix array, Lazy suffix tree)
ACGCTTCATCGCCCT…
Index of reads (Radix tree of seeds)
Index of genome (e.g. FM-‐index)
Reads
Chr. 2 Chr. 1
Chr. X
Genome
Masai read mapper
41 Nvidia Webinar, 22.10.2013
Read Mapping: Masai
Faster and more accurate than BWA and BowLe2 Timings on a single core
42 Nvidia Webinar, 22.10.2013
Easily exchange index….
43 Nvidia Webinar, 22.10.2013
Collaboration to parallelize indices and verification algorithms in SeqAn, to speed up any applications making use of indices
What about multi-core implementation?
44 Nvidia Webinar, 22.10.2013
SeqAn going parallel
GOAL Parallelize the finder interface of SeqAn
so it works on CPU and accelerators like GPU
Will be replaced by hg18 and 10 million 20-‐mers
45 Nvidia Webinar, 22.10.2013
SeqAn going parallel
Construct FM-‐index on reverse genome
Set # OMP threads Call generic count funcLon
46 Nvidia Webinar, 22.10.2013
SeqAn going parallel : NVIDIA GPUs
SAME count funcLon as on CPU !
Copy needles and index to GPU
47 Nvidia Webinar, 22.10.2013
…12... 2.66 sec
18.6 sec 1 X
Intel Xeon Phi 7120, 244 threads
2.18 sec
SeqAn going parallel
Count occurrences of 10 million 20-‐mers in the human genome using an FM-‐index
47 X
7 X
NVIDIA Tesla K20
I7,3.2 GHz
8.5 X
0.4 s
48 Nvidia Webinar, 22.10.2013
66.1 s
…12...
1 X
SeqAn going parallel
Approx. count occurrences of 1.2 million 33-‐mers in the human genome using an FM-‐index
20.7 X
7.3 X
NVIDIA Tesla K20
I7,3.2 GHz
16.9 X
9.0 s
3.9 s
3.2 s
Intel Xeon Phi 7120, 244 threads
49 Nvidia Webinar, 22.10.2013
Part II: The details
Parallelization on the GPU
Nvidia Webinar, 22.10.2013
CUDA preliminaries
Nvidia Webinar, 22.10.2013
In order to use CUDA we first had to adapt some parts of SeqAn:
• CUDA requires each funcLon to be prefixed with domain qualifiers __host__ or __device__ in order to generate CPU/GPU code
• We prefixed all basic template funcLons with a SEQAN_HOST_DEVICE macro
• StaLc const arrays are not allowed in the way SeqAn defines them
• We replaced alphabet conversion lookup tables (e.g. Dna<--> char) by conversion funcLons
#ifdef __CUDACC__ !#define SEQAN_HOST_DEVICE inline __device__ __host__ !#else!#define SEQAN_HOST_DEVICE inline !#endif!
• Instead of defining a new CUDA string we simply use the Thrust library:
• Provides host_vector and device_vector classes, which are vectors with buffers in host or device memory
• However, Thrust funcLons are callable only from host-‐side
• We made both vectors accessible from SeqAn
• SeqAn strings have to provide a set of global (meta-‐)funcLons, e.g. Value<>, resize(), …
• We simply defined the required wrapper funcLons for these two vectors
Strings
Nvidia Webinar, 22.10.2013
Standard Strings
• Up to here, all strings can only be used on the side of their scope
Nvidia Webinar, 22.10.2013
Device Memory Host Memory
thrust::host_vector! Buffer
seqan::String ! Buffer seqan::String ! Buffer
thrust::device_vector! Buffer
• How to access a device_vector from device-‐side?
• We could pass (POD) iterators to the kernel
• However, many SeqAn algorithms work on more complex containers
• We need the same interface of the container on the device side
• For strings we developed a so-‐called ContainerView (POD type)
• Provides a container interface given the begin/end pointers of vector buffer
• The view() funcLon creates the ContainerView object for a given device_vector!
Host-Device String
Nvidia Webinar, 22.10.2013
Host-Device String
Nvidia Webinar, 22.10.2013
Device Memory Host Memory
thrust::device_vector! Buffer
view() !
seqan::ContainerView! seqan::ContainerView!kernel launch !
• How to use a device_vector on the device
• For generic GPU programming:
• The Device metafuncLon returns the device-‐memory equivalent of a class
• The View metafuncLon returns the (POD) view type of a class
// Replaces String with thrust::device_vector. !template <typename TValue, typename TSpec> !struct Device<String<TValue, TSpec> > !{ ! typedef thrust::device_vector<TValue> Type; !}; !
// Returns a view type that can be passed to a CUDA kernel.!template <typename TValue, typename TAlloc> !struct View<thrust::device_vector<TValue, TAlloc> > !{ ! typedef ContainerView<thrust::device_vector<TValue, TAlloc> > Type; !}; !
Device and View metafunctions
Nvidia Webinar, 22.10.2013
• A simple example to reverse a string on the GPU
// A standard SeqAn string over the Dna alphabet.!String<Dna> myString = "ACGT"; !!// A Dna string on device global memory.!typename Device<String<Dna> >::Type myDeviceString; !!// Copy the string to global memory.!assign(myDeviceString, myString); !!// Pass a view of the device string to the CUDA kernel.!myKernel<<<1,1>>>(view(myDeviceString)); !!// TString is ContainerView<device_vector<Dna> >.!template <typename TString> !__global__ void myKernel(TString string) !{ ! printf(”length(string) = %d\n", length(string)); ! reverse(string); !} !
Hello world
Nvidia Webinar, 22.10.2013
• More complex structures (e.g. Index, Graph) can only be ported to the GPU if they …
• don’t use pointers
• use only strings of POD types (String<Dna>, but not String<String<…> >)
• use only 1-‐dimensional StringSets (ConcatDirect)
• Nested classes are no problem
• View metafuncLon converts all member types into their view types
• view() funcLon is called recursively on all members
Porting complex data structures
Nvidia Webinar, 22.10.2013
Example: FM Index
Nvidia Webinar, 22.10.2013
The FM-index (BWT, LF-mapping)
Nvidia Webinar, 22.10.2013
The FM-index (search ssi)
Nvidia Webinar, 22.10.2013
a3 = C(‘i’) + Occ(‘i’,0) + 1 = 1 + 0 + 1 b3 = C(‘i’) + Occ(‘i’,12) = 1 + 4
The FM-index (backwards search)
Nvidia Webinar, 22.10.2013
a1 = C(‘s’) + Occ(‘s’,8) + 1 = 8 + 2 + 1 b1 = C(‘s’) + Occ(‘s’,10) = 8 + 4
• The FM-‐index can be implemented using a number of string-‐based lookup tables
• ... as well as other indices, e.g. enhanced suffix array, q-‐gram index
• There is a space-‐Lme tradeoff between all these indices
• The FM index has the minimal memory requirements
The FM-index in SeqAn
Nvidia Webinar, 22.10.2013
• SeqAn‘s FM-‐index consists of some nested classes storing Strings
FM-‐index (host-‐only)
A generic FM-index
Nvidia Webinar, 22.10.2013
• The Device type of the FM index uses device_vector instead of String !
• The view of this object (= device-‐part) is the same tree, where leaves are replaced by ContainerViews of device_vectors
GPU FM-‐index (host-‐part)
A generic FM-index
Nvidia Webinar, 22.10.2013
CPU vs. GPU
• Invoking an FM-‐index based search on CPU and GPU:
// Select the index type.!typedef Index<DnaString, FMIndex<> > TIndex; !!// Type is Index<device_vector<Dna>, FMIndex<> >.!typedef typename Device<TIndex>::Type TDeviceIndex; !!// ======== On CPU ======== // ========== On GPU ===========!!// Create an index. // Create a device index.!TIndex index("ACGTTGCAA"); TIndex index("ACGTTGCAA"); ! TDeviceIndex deviceIndex; ! assign(deviceIndex, index); !!// Use the FM-index on CPU. // Use the FM-index in a CUDA kernel.!findCPU(index,…); findGPU<<<...>>>(view(deviceIndex),…); !!template <typename TIndex> template <typename TIndex> !void __global__ void!findCPU(TIndex & index,…); findGPU(TIndex index,…); !
Nvidia Webinar, 22.10.2013
The findGPU kernel AND the findCPU function will invoke many
instances of the SAME generic function which will perform a backtracking algorithm on our
generic index interface
do { ! if (finder.score == finder.scoreThreshold) ! { ! if (goDown(textIt, suffix(pattern, patternIt))) delegate(finder); ! goUp(textIt); ! if (isRoot(textIt)) break; ! } ! else if (finder.score < finder.scoreThreshold) ! { ! if (atEnd(patternIt)) delegate(finder); ! else if (goDown(textIt)) ! { ! finder.score += parentEdgeLabel(textIt) != value(patternIt); ! goNext(patternIt); ! continue; ! } ! } !! do { ! goPrevious(patternIt); ! finder.score -= parentEdgeLabel(textIt) != value(patternIt); ! } while (!goRight(textIt) && goUp(textIt)); !! if (isRoot(textIt)) break; ! finder.score += parentEdgeLabel(textIt) != value(patternIt); ! goNext(patternIt); !} !while (true); !
Approximate search via backtracking
Nvidia Webinar, 22.10.2013
Outlook for GPU support
Nvidia Webinar, 22.10.2013
• Our next steps are:
• Provide parallelFor() to hide CUDA kernel call/OpenMP for-‐loop
• Develop classes for concurrent access (String, job queues)
• Port more indices and index iterators to be used with CUDA
• Port SeqAn‘s alignment module
• Develop a CPU/GPU version of the FM-‐index based read mapper Masai
• ...
• Follow our development:
• Sources: hqps://github.com/seqan/seqan/tree/develop
• Code examples: hqp://trac.seqan.de/wiki/HowTo/DevelopCUDA
69 Nvidia Webinar, 22.10.2013
Generic Parallelization
Multicore parallelization
struct Serial_; !typedef Tag<Serial_> Serial; !!struct Parallel_; !typedef Tag<Parallel_> Parallel; !
• We first introduced Tags to switch between serial and parallel algorithms:
template <typename T> !inline T atomicInc(T &x, Serial) !{ ! return ++x; !} !!template <typename T> !inline T atomicInc(volatile T &x, Parallel) !{ ! __sync_add_and_fetch(&x, 1); !} !
• Then we defined basic atomic operaLons required for thread safety:
• To this end, we developed the Splitter<TValue, TSpec> to compute a parLLon into subintervals of (almost) equal length …
Splitter<unsigned> splitter(10, 20, 3); !for (unsigned i = 0; i < length(splitter); ++i) ! cout << '[' << splitter[i] << ',' << splitter[i+1] << ')' << endl; !!// [10,14) !// [14,17) !// [17,20)!
Splitter
• The Spliqer can also be used with iterators directly
• The Serial / Parallel tag divides an interval range into 1 / #thread_num many intervals
• The parallel tag can be used to switch off the parallel behaviour
template <typename TIter, typename TVal, typename TParallelTag> !inline void arrayFill(TIter begin_, TIter end_, ! TVal const &value, Tag<TParallelTag> parallelTag) !{ ! Splitter<TIterator> splitter(begin_, end_, parallelTag); !! SEQAN_OMP_PRAGMA(parallel for) ! for (int job = 0; job < (int)length(splitter); ++job) ! arrayFill(splitter[job], splitter[job + 1], value, Serial()); !} !
Splitter
73
…12... 2.66 sec
18.6 sec 1 X
Intel Xeon Phi 7120, 244 threads
2.18 sec
SeqAn going parallel
Count occurrences of 10 million 20-‐mers in the human genome using an FM-‐index
47 X
7 X
NVIDIA Tesla K20
I7,3.2 GHz
8.5 X
0.4 s
Thank you for your attention
Upcoming GTC Express Webinars
Register at www.gputechconf.com/gtcexpress
October 23 - Revolutionize Virtual Desktops with the One Missing Piece: A Scalable GPU
October 30 - OpenACC 2.0 Enhancements for Cray Supercomputers
October 31 - Getting the Most out of NVIDIA GRID vGPU with Citrix XenServer
November 5 - Accelerating Face-in-the-Crowd Recognition with GPU Technology
November 6 - Bright Cluster Manager: A CUDA-ready Management Solution for GPU-based HPC
GTC 2014 Call for Posters
Posters should describe novel or interesting topics in
§ Science and research
§ Professional graphics
§ Mobile computing
§ Automotive applications
§ Game development
§ Cloud computing
Call opens October 29
www.gputechconf.com
Sign up for FREE GPU Test Drive on remotely hosted clusters
Develop your codes on latest GPUs today
Test Drive NVIDIA GPUs! Experience The Acceleration
www.nvidia.com/GPUTestDrive