mapping mutations from zebrafish mutagenic screens...
TRANSCRIPT
MAPPING MUTATIONS FROM ZEBRAFISH MUTAGENIC SCREENS USING WHOLE
GENOME SEQUENCING
Greg Baillie | IMB Sequencing Facility | Winter School | 8 July 2015
CARDIOVASCULAR AND LYMPHATIC SYSTEMS
circulatory system. Art. Britannica Online for Kids. Web. 1 July 2015. <http://kids.britannica.com/elementary/art-171939>.
Herbert and Stainier, Nat. Rev. Mol. Cell Biol. (2011)
CARDIOVASCULAR SYSTEMRoles• Transport oxygen and nutrients to tissues, remove
carbon dioxide and waste from tissues• Immune cell circulation and surveillance• Regulation of body temperature, pH, water content
Disorders• Congenital heart defects• Cardiomyopathies• Cardiovascular disease
Known genes• bone
morphogenetic protein (BMP), Notch, WNT, sonic hedghog (SHH)
• Heart of glass (HEG), titin (TTN)
circulatory system. Art. Britannica Online for Kids. Web. 1 July 2015. <http://kids.britannica.com/elementary/art-171939>.
Xin et al., Nat. Rev. Mol. Cell Biol. (2013)
LYMPHATIC SYSTEM
circulatory system. Art. Britannica Online for Kids. Web. 1 July 2015. <http://kids.britannica.com/elementary/art-171939>.
Roles• Removal of interstitial fluids from tissues• Absorbs fats from digestive system• Transports immune cells to/from lymph nodes
Disorders• Lymphadenopathy• Lymphedema• Cancer
Known genes• LYVE1, VEGFC,
SOX18, CCBE1, etc
Stacker et al., Nat. Rev. Cancer (2014)
ZEBRAFISH (Danio rerio)
• Phylum: Chordata• Subphylum: Vertebrata• Superclass: Osteichthyes• Class: Actinopterygii• Order: Cypriniformes• Family: Cyprinidae• Genus: Danio• Species: D. rerio
"Fish evolution" by Epipelagic - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commonshttp://commons.wikimedia.org/wiki/File:Fish_evolution.png#/media/File:Fish_evolution.png
ZEBRAFISH MODEL
• Easier to house and care for than rodents
• Lots of offspring• Embryos develop
outside body, optically transparent
• Genome sequenced• Genetic similarity to
humans• Genetic manipulation
technologies
www.nichd.nih.gov
200-300 per pairing (cf. 5-10 for mouse)
http://www.dailyemerald.com/
www.nc3rs.org.uk
ZEBRAFISH MODEL
• Easier to house and care for than rodents
• Lots of offspring• Embryos develop
outside body, optically transparent
• Genome sequenced• Genetic similarity to
humans• Genetic manipulation
technologies Karlstrom and Kane, Development, 1996
http://www.imb.uq.edu.au/ben-hogan
www.nc3rs.org.uk
ZEBRAFISH MODEL
• Easier to house and care for than rodents
• Lots of offspring• Embryos develop
outside body, optically transparent
• Genome sequenced• Genetic similarity to
humans• Genetic manipulation
technologies Genome Reference ConsortiumEnsembl
Zebrafish HumanChromosomes 1-25 1-22, X, YTotal bases 1,412,464,843
bp3,234,834,689
bpScaffolds 4,559
(3,452 placed, 1,107 unplaced)249
(190 placed, 59 unplaced)
Scaffold N50 1,551,602 bp 46,395,641 bpCoding genes 26,241 20,774Non-coding genes
6,097 22,493
www.nc3rs.org.uk
ZEBRAFISH MODEL
• Easier to house and care for than rodents
• Lots of offspring• Embryos develop
outside body, optically transparent
• Genome sequenced• Genetic similarity to
humans• Genetic manipulation
technologiesHowe et al., Nature (2013)
www.nc3rs.org.uk
ZEBRAFISH MODEL
• Easier to house and care for than rodents
• Lots of offspring• Embryos develop
outside body, optically transparent
• Genome sequenced• Genetic similarity to
humans• Genetic manipulation
technologies
www.nc3rs.org.uk
• Targeted gene alteration– CRISPR, TALEN, Morpholino,
etc– Rapid confirmation of
candidate genes
FORWARD GENETIC SCREEN
• Trying to find genes involved in development or disease• Induce random mutations
– Insertional mutagenesis (eg. transposons)– Radiation-induced mutagenesis– Chemical mutagenesis (eg. ENU)
• Screen mutants for phenotype of interest• Map mutation to region of genome• Identify mutated gene
– Sequencing
• Confirm role of gene– KO– Complementation
ENU MUTAGENESIS
• N-Ethyl-N-nitrosourea• Alkylating agent• Preference for A→T base transversions and AT→GC transitions
– Also causes GC→AT transitions• Missense (64%), splice site (26%), nonsense (10%)a
• Bath fish in ENU solution• ~1 mutation every 100,000-150,000 bp
aJustice et al. Hum. Mol. Genet. (1999)
ZEBRAFISH FORWARD GENETIC SCREENlyve1 strain
ENUmutagenesis
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
25% a*/a*
Mutantphenotype
F0
F1
F2
HOGAN/SMITH SCREENS AT IMB
• 420 families screened from incrossed F1 fish from heavily mutagenised founder population
• ~50 lymphatic development mutants (Hogan lab)
• ~40 heart development mutants (Smith lab)• Largest zebrafish screen completed in Australia• Now in mapping, gene characterisation, and
confirmation phase– 24 lymph mutants sequenced– 16 heart mutants sequenced– 6 heart/lymph mutants sequenced
ZEBRAFISH FORWARD GENETIC SCREENqWIK strain
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
25% a*/a*
Mutantphenotype
F0
F1
F2
TRADITIONALMAPPING
• Microsatellites• Coarse mapping - to
identify chromosome (weeks/months)– Pools of fish
• Fine mapping - to identify region of chromosome (weeks/months)– On individual embryos (100s-
1000s)
• Sequence pools of "mutant" and ”reference" fish (3 days)– ~8 genomes per
NextSeq 500 run• Analyse (1 day)
– Map against reference genome (Zv9) [BWA]
– Call variants [GATK]– Identify mutant region
by homozygositymapping
SEQUENCING-BASEDMAPPING
ZEBRAFISH FORWARD GENETIC SCREENlyve1 strain
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
25% a*/a*
Mutantphenotype
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
qWIK strain
25% a*/a*
Mutantphenotype
ZEBRAFISH FORWARD GENETIC SCREENlyve1 strain
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
25% a*/a*
Mutantphenotype
a*/A
A/A
A/A
A/A
a*/A
a*/A
A/A
a*/AA/A a*/A
qWIK strain
25% a*/a*
Mutantphenotype
ZEBRAFISH FORWARD GENETIC SCREEN
lyve1 strain
qWIK strain
25% a*/a*
Mutantphenotype
40X
13X 20X
8-20X
ANALYSIS PIPELINElane 1fastqs
Map [BWA MEM]Sort [Picard]
sorted.1.bam
danRer7.fa
realign.bam
MarkDuplicates [Picard]Realign [GATK]
lane 2fastqs
lane 3fastqs
lane 4fastqs
MMerge [Picard]
Map [BWA MEM]Sort [Picard]
Map [BWA MEM]Sort [Picard]
Map [BWA MEM]Sort [Picard]
sorted.2.bam
sorted.3.bam
sorted.4.bam
mutant.vcf
Genotype [GATK]
mutant.ref.vcf
Genotype [GATK]
coverage.py
coverage.json
lyve1.bam
qWIK.F.bam
qWIK.M.bam
danRer7.fa
homozygosity.bedgraph
homozygosity.bedgraph
ANALYSIS PIPELINE
SNP Impact [SnpEff]
mutant.snpeff.vcf
homozygosity_calculator.py find_candidate_snps.py
candidates.vcf
Annotate [SnpSift]
candidates.snpeff.vcf
homozygosity.bedgraph
coverage.json
snpeff_to_html.py
Summary.html
mutant.vcfmutant.ref.vcf
fish_conservation.bedgraph
hvertebrate_conservation.bedgrap
hME1.bed
ME3.bed
PyVCF
PyVCFPySam
HOMOZYGOSITY MAPPING
Adapted from Leshchiner et al., Genome Res.2012
mutation
mutant pool
lyve1
qWIK
Homozygous lyve1 region
HOMOZYGOSITY MAPPING
mutation
mutant pool
danRer7 referencegap
window size step size
HOMOZYGOSITY SCORE
(Henke et al)# Mutant hom
# Mutant het ×# qWIK not in mutant
# qWIK in mutant
# mutant hom lyve# lyve informative and qWIK informative
# mutant het lyve/qWIK ×
HOMOZYGOSITY PLOTS
HOMOZYGOSITY PEAKS
Screen Defined Peak
Broad Peak No peak
Heart 8 1 7
Lymph 15 4 5
Heart/Lymph 2 2 2
TRADITIONALMAPPING
• Re-sequence genes in region– PCR, Sanger
sequencing
• Use sequence data– Find candidate
mutations– Estimate mutation
impact
SEQUENCING-BASEDMAPPING
MUTATION DETECTION• Find candidate mutations
– Python– Relaxed
• Mutant covered• At least one of reference samples is covered• Majority allele in mutant not majority in reference samples
– Strict• Mutant is ‘homozygous’• All reference samples covered• Mutant allele absent in all reference samples
• Assess impact– SnpEff (vs Ensembl annotation)
• HIGH, eg. Coding start lost, coding stop gained, splicing, frameshift, etc• MODERATE, eg. Non-synonymous coding change, etc• LOW, eg. Synonymous coding change, etc• MODIFIER, eg. Intron, upstream, downstream
– Conservation in fish, vertebrates
• HTML output
PREMATURE STOP CODON
SPLICE SITE DONOR/ACCEPTOR
MUTATION DETECTIONNON-SYNONYMOUS SUBSTITUTION
MUTATION DETECTIONNON-SYNONYMOUS or NONSENSE?
MUTATION TYPES SUMMARY
Known Unknown CountClear homozygosity peak 33
Single candidate 13 14 27Nonsense 7 8 16
Non-synonymous 1 5 6
Splice donor/acceptor 4 1 4
Intron/upstream/downstream 1 0 1
Multiple candidates 5No candidates 1
No homozygosity peak 13
VALIDATION (mafba)
v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog Ba
VALIDATION (mafba)
Koltowska et al. 2015 (In press)
SUMMARY• Comprehensive screen of genes involved in heart
and lymphatic development ongoing• Whole genome sequencing enables rapid
mapping– Days/weeks (cf. months/years)
• Can also identify causative mutation (sometimes)• Sequencing parental fish decreases per-sample
cost (for large screens)• Combination of mapping techniques (traditional
and sequencing) and new in vivo techniques (eg. TALENs) allows rapid identification of genes involved in vertebrate development
ACKNOWLEDGEMENTS
TAFT LABRyan Taft
Christine EnderKe-Lin Ru
Michael ClarkDarya Vanichkina
Anupma ChoudharyAndrew Calcino
Hyun Jae ‘Josh’ Lee
HOGAN LABBen Hogan
Kaska KoltowskaScott Paterson
Neil BowerChristine Neyt
Anne LagendijkSungmin BaekBaptiste Coxam
Joelle Kartopawiro
SMITH LABKelly SmithSam Capon
Daniela GrassiniJessica De Angelis
QCMG SEQUENCINGDave Miller
Ivon HarliwongSenel Idrisoglu
Suzanne ManningEhsan Nourbakhsh
Craig NourseQCMG IT
John PearsonLynn Fink
Scott WoodDarrin Taylor
Conrad Leonard
QCMGSean Grimmond
Peter Wilson
SIMONS LABCas SimonsJo Crawford
Jason da SilvaDoug Stetner
ISFAngelika Christ
Tim Bruxner
IMB Sequencing Facility
Let our team help you with library preparation and sequencing on Illumina NextSeq 500and MiSeq platforms.
Our standard sequencing services include:• Whole transcriptome• Gene Expression Profiling• Whole Human Exome• Whole Genome
Our custom sequencing services include:• Custom capture• Illumina sequencing panel• Amplicon sequencing• Sequencing customer-prepared libraries
HOMOZYGOSITY MAPPINGmutation
mutant pool
danRer7 referencegap
window size step size
homozygosity_calculator.pyinput_vcfmutant_idoutput_prefix[gaps_file]parse_vcf_number_of_snps()parse_vcf_physical_distance()
WindowControllerlist windowsint start_indexadd_window()create_windows_for_contig(contig,
contig_coords, window_size=10000, step_size=1000, window_type=None)
update_count_in_windows_for_keys(pos, keys)clear_windows()
SnpWindowstring chromosomeint startint endint segment_startint segment_endint window_sizeint step_sizeCounter countswiggle_start(prev_window=None)wiggle_end(next_window=None)
ContigCoordinatesdict contig_lengthsdict gap_coordsget_contig_lengths_from_vcf(vcf_reader)get_gap_coords_from_bed_file(bed_file)gapped_contig_coords_for_contig(contig)