unc computational systems biology - lecture01lecture01.ppt author leonard mcmillan created date...
TRANSCRIPT
Comp 790-‐087
• Administra*ve Details
• Course Overview • Simple Gene*cs
1/13/14 1 Comp 790– Introduc*on & Administriva
Computational Genetics
Course overview
• Synopsis – Graduate-‐level project course – Guided readings, discussions,
in-‐class exercises, final projects and write-‐up
– We will probably move to SN325 (will meet here next week)
• Website – To appear at:
hUp://www.csbio.unc.edu/mcmillan/index.py?run=Courses.Comp790S14
• Course Grading – Class Par*cipa*on 10% – Research Paper Presenta*on 20% – Project Proposal 20% – Final Project & Write-‐up 50%
1/13/14 2 Comp 790– Introduc*on & Administriva
Syllabus
• Part 1. Background Topics (1/3) – each student will contribute to a shared s/w library (bring your laptops, preloaded with Python 2.7)
– Genotyping and Sequencing
– Recombina*on, phasing, genome mapping
– Popula*on structure and coalescent theory
– Selec*on, evolu*on, and phylogene*c trees
– Epigene*cs, imprin*ng, chroma*n structure, and X inac*va*on
• Part 2. Paper Presenta*ons (1/3) – each student will be assigned two papers– One as a presenter, the second as a discussant
– Haplotype assembly from sequence data
– De novo genome and transcriptome assembly
– Sequence archival, compara*ve genomic analysis, genomic compression, and query
– Detec*ng structural varia*on using sequence and genotype data
– Inferring chroma*n structure using sequence and methyla*on data
• Part 3. Project Proposals (1/3)
1/13/14 Comp 790– Introduc*on & Administriva 3
Do you want to be genotyped?
• I have 10 donated kits and will get more if needed
• Analyze yourself using the tools we develop
• Contribute your genotypes if you want others to use
• You must be signed up (not audi*ng) to be genotyped
• Who in this class is your closest rela*ve?
1/13/14 Comp 665 – Introduc*on & Signals 4
It’s about genes
• Gene*cs is most clearly understood by considering its subject to be genes rather than organisms
• Organisms are merely vessels for assuring the survival of genes
• The central objec*ve of a gene is to propagate itself
• Successful genes live on long afer their host organism
1/13/14 Comp 790– Introduc*on & Administriva 5
“[Genes] that survived were the ones that built survival machines for themselves to live in. But making a living got steadily harder as new rivals arose with better and more effective survivial machines. Survival machines got bigger and more eloborate, and the process was cumulative and progressive…”
-- Dawkins, The Selfish Gene
Top-‐down and boBom-‐up
• Gene*cs results from inheritance
• Ancestral proper*es can be inferred from extant popula*ons
1/13/14 Comp 790– Introduc*on & Administriva 6
Exercise
• Answer the following – How many dis*nct Y chromosomes are in this
pedigree and who shares them?
– How many dis*nct Mitochondrial DNAs are in this pedigree and who shares them?
– Explain how you might reconstruct the paternal X chromosome of siblings B, C, and D?
– What frac*on of DNA is shared between G and H?
– What frac*on of DNA is shared by H and I?
– Suppose that the samples, P, A, D, G, and I have a common phenotype not present in the remainder. Which two samples are most helpful for localizing the gene*c component?
1/13/14 Comp 665 – Introduc*on & Signals 7
P
C
A
B D E
G
I
F
H
PopulaGon geneGcs
• Popula*on gene*cs differs from classical gene*cs – Analysis rather than synthesis – Depends on models, which aUempt to explain observa*ons
– Less emphasis on Darwin’s natural selec*on
• Considers popula*on dynamics – Isola*on – BoUlenecks
1/13/14 Comp 790– Introduc*on & Administriva 8
Why computer science
• Classically, gene*cs, both top-‐down (classical) and boUom-‐up (popula*on) have focused on mathema*cal/sta*s*cal models
• Access to gene*cs data has recently outpaced our capability to process, interpret, and analyze it
• As model complexity increases, it becomes harder to find closed-‐form solu*ons
• Relies more and more on computa*onal modeling to infer structure, account for noise etc.
1/13/14 Comp 790– Introduc*on & Administriva 9
Wright-‐Fisher model
• One of the first, and simplest models of popula*on genealogies was introduced by Wright (1931) and Fisher (1930).
• Model emphasizes transmission of genes from one genera*on to the next
• For simplicity we’ll first focus on a fixed popula*on size, each with a dis*nct gene variant
1/13/14 Comp 790– Introduc*on & Administriva 10
Simple haploid model
• Rules – Antecedent genes are chosen randomly, with replacement, from their parental genera*on
– No selec*on – Fixed popula*on size
1/13/14 Comp 790– Introduc*on & Administriva 11
G0: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
G1: ['J', 'A', 'H', 'B', 'I', 'E', 'D', 'G', 'A', 'B']
G2: ['A', 'J', 'E', 'G', 'D', 'E', 'B', 'I', 'A', 'A']
G3: ['A', 'A', 'E', 'J', 'I', 'A', 'I', 'A', 'J', 'B']
G4: ['E', 'A', 'B', 'B', 'A', 'E', 'A', 'A', 'A', 'A']
G5: ['A', 'A', 'B', 'A', 'A', 'E', 'A', 'A', 'A', 'B']
G6: ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'A', 'A']
G7: ['B', 'A', 'A', 'B', 'A', 'A', 'A', 'A', 'A', 'A']
What will this population eventually look like?
AssumpGons of Wright/Fisher
• Discrete and non-‐overlapping genera*ons • Haploid individuals • Popula*ons size is constant • All individuals are equally fit • No popula*on of social structure • Genes segregate independently
1/13/14 Comp 790– Introduc*on & Administriva 12
Some Graphical AbstracGons
• Replace leUers with colors
• Draw lineages • Sort topologically
1/13/14 13 Comp 790– Introduc*on & Administriva
Repeats
1/13/14 Comp 790– Introduc*on & Administriva 14
Every population results in just one gene
Onset of uniformity
• 10000 trials • Mode = 11 (616)
• Mean = 17.5
1/13/14 Comp 790– Introduc*on & Administriva 15
Next Time
• Gene*cs Background – Inbreeding – Coefficient of relatedness
– Diploidy – Recombina*on
– Phasing • Bring your laptops loaded with Python 2.7
– We will analyze real genotypes
• A list of recent papers to choose from
1/13/14 Comp 665 – Introduc*on & Signals 16