2014 09-29 2nd monday overview

17
Genome Bioinformatics [email protected]

Upload: yannick-wurm

Post on 02-Jul-2015

152 views

Category:

Science


1 download

DESCRIPTION

Bioinformatics MSc - Genome bioinformatics

TRANSCRIPT

Page 1: 2014 09-29 2nd monday overview

Genome Bioinformatics

[email protected]

Page 2: 2014 09-29 2nd monday overview

Genomics?

Page 3: 2014 09-29 2nd monday overview

Genomics - WikipediaGenomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism).[1][2] Advances in genomics have triggered a revolution in discovery-based research to understand even the most complex biological systems such as brain.[3] The field includes efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome.[4] !!In contrast, the investigation of the roles and functions of single genes is a primary focus of molecular biology or genetics and is a common topic of modern medical and biological research. Research of single genes does not fall into the definition of genomics unless the aim of this genetic, pathway, and functional information analysis is to elucidate its effect on, place in, and response to the entire genome's networks.[5][6]

Page 4: 2014 09-29 2nd monday overview

Estevezj - CC3 Wikimedia

29/09/2014 14:14

Page 1 of 1http://upload.wikimedia.org/wikipedia/commons/7/73/Number_of_prokaryotic_genomes_and_sequencing_costs.svg

Ⓑ Ⓒ

Page 5: 2014 09-29 2nd monday overview

• Genomics

• Biodiversity assessments

• Stool microbiome sequencing

• Personalized medicine

• Cancer genomics

Page 6: 2014 09-29 2nd monday overview

Challenges

1. Getting up and running with Unix

2. Algorithms in Bioinformatics: strengths & weaknesses

3. Bioinformatics databases

4. DIY: genome assembly & identifying variants.

Page 7: 2014 09-29 2nd monday overview

Getting up and running with Unix & High Performance Computing

(HPC)

ITS Research Team (Lukasz Zalewski): 1. Install virtualbox & biolinux. 2. Introduction to Unix 3. Using Apocrita HPC = “the cluster”

!

Page 8: 2014 09-29 2nd monday overview

Algorithms for sequence alignment.

- dotplots- the concept of distance: Euclidean, hamming, Levenshtein - dynamic programming and the Smith Waterman algorithm - local, global, semiglobal alignments - gap penalty models - basics of approximate methods (Blast) - scoring matrices (PAM, Blosum) - Profiles and PSI-Blast

Page 9: 2014 09-29 2nd monday overview

Take home message?•Algorithms are approximate •Results aren’t perfect •Computers can get it wrong

Algorithms for sequence alignment.

Page 10: 2014 09-29 2nd monday overview

BLAST is unable to detect any similarity between these 2 sequences:

Gp-9 1 ATGAAGACGTTCGTATTGCATATTTTTATTTTTGCTCTCGTGGCTTTCGCTTCTGCATCT 60 ||||||||||| |||||||||| ||||||||| |||||||| |||||||||| |||||K2000 1 ATGAAGACGTTGGTATTGCATAATTTTATTTT---TCTCGTGGATTTCGCTTCTCCATCT 57!Gp-9 61 CGTGATAGCGCGAGGAAGATAGGATCCCAATATGACAATTACGCGACTTGCTTAGCCGAA 120 ||||| ||||||| || ||| ||||||||| |||||| |||||| ||||||||| |||||K2000 58 CGTGAGAGCGCGAAGACGATGGGATCCCAACATGACATTTACGCCACTTGCTTACCCGAA 117!Gp-9 121 CATAGTCTAACAGAGGATGACATCTTCTCGATTGGTGAAGTATCAAGTGGCCAGCACAAA 180 |||| ||||| || |||| || | ||||||||| ||||||||| |||||||||| |||||K2000 118 CATAATCTAAGAGGGGATAACGTTTTCTCGATTCGTGAAGTATAAAGTGGCCAGGACAAA 177!Gp-9 181 ACCAATCATGAAGATACCGAACTACACAAAAATGGTTGCGTCATGCAATGTTTGTTAGAA 240 |||| ||||||||| |||||||| ||||||||| || ||||||| |||||||| ||||||K2000 178 ACCAGTCATGAAGAAACCGAACTCCACAAAAATCGTCGCGTCATACAATGTTTATTAGAA 237!Gp-9 241 AAAGATGGACTGATGTCTGGAGCTGATTATGATGAAGAGAAAATGCGTGAGGACTATATC 300 |||||||| |||||| ||| ||| ||||||||| ||| |||||||||| |||||||||K2000 238 TAAGATGGAATGATGTGTGGGGCTAATTATGATGGAGAAAAAATGCGTGCTGACTATATC 297!Gp-9 301 AAGGAA------ACAGGTGCTCAACCAGGAGATCAAAGGATAGAAGCTCTGAATGCCTGC 354 | |||| || |||| |||||||||| |||| |||| |||| |||||||||| | |K2000 298 AGGGAATCAGGTACCGGTGGTCAACCAGGACATCAGAGGAGAGAACCTCTGAATGCGTAC 357!Gp-9 355 ATGCAAGAAACAAAAGACATGGAGGATAAATGTGACAAAAGCTTGCTCCTTGTAGCATGT 414 ||||||||| ||||||| ||| ||| |||||| ||||||||| | || ||| |||||K2000 358 ATGCAAGAATCAAAAGATATGCAGGTTAAATGGCACAAAAGCT---TTCTAGTAACATGT 414!Gp-9 415 GTCTTAGCAGCTGAAGCTGTGCTCGCCGATTCTAACGAAGGAGCATAA 462 | |||||||| | |||||| ||||| |||||| ||||||||| ||||K2000 415 ATTTTAGCAGCGGGAGCTGTTCTCGCGGATTCTCACGAAGGAGAATAA 462

Page 11: 2014 09-29 2nd monday overview

Take home message?• Algorithms are approximate • Results depend on:

• underlying biology • approximations made by algorithms • search and database size

Algorithms for sequence alignment.

Page 12: 2014 09-29 2nd monday overview

Databases for Bioinformatics

• Biological databases & access to the annotated genomes • NCBI • Ensembl • UCSC • Entrez & Biomart • Genbank/Uniprot !

• Cancer resources and data portals • TCGA, ICGC and Cosmic

Page 13: 2014 09-29 2nd monday overview

Take home message?

Databases for Bioinformatics

Page 14: 2014 09-29 2nd monday overview

Genome Assembly & variant calling• Processing raw data

• Genome assembly algorithms

• Read mapping

• Quality Assurance processes

• Calling & visualising variants

• Automated gene prediction

• Doing things in the command-line

Page 15: 2014 09-29 2nd monday overview

Bruno Vieira

Rodrigo Pracana

Page 16: 2014 09-29 2nd monday overview

Old & modern assembly algorithms

• Overlap-layout consensus

!

• De bruijn-based.

Page 17: 2014 09-29 2nd monday overview