unc computational systems biology - lecture01lecture01.ppt author leonard mcmillan created date...

16
Comp 790087 Administra*ve Details Course Overview Simple Gene*cs 1/13/14 1 Comp 790– Introduc*on & Administriva Computational Genetics

Upload: others

Post on 07-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Comp  790-­‐087  

•  Administra*ve  Details  

•  Course  Overview  •  Simple  Gene*cs  

1/13/14   1  Comp  790–  Introduc*on  &  Administriva  

Computational Genetics

Page 2: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Course  overview  

•  Synopsis  –  Graduate-­‐level  project  course  –  Guided  readings,  discussions,    

in-­‐class  exercises,  final  projects  and  write-­‐up    

–  We  will  probably  move  to  SN325  (will  meet  here  next  week)  

•  Website  –  To  appear  at:    

hUp://www.csbio.unc.edu/mcmillan/index.py?run=Courses.Comp790S14  

•  Course  Grading  –  Class  Par*cipa*on              10%  –  Research  Paper  Presenta*on        20%  –  Project  Proposal              20%  –  Final  Project  &  Write-­‐up          50%  

1/13/14   2  Comp  790–  Introduc*on  &  Administriva  

Page 3: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Syllabus  

•  Part  1.  Background  Topics  (1/3)  –  each  student  will  contribute  to  a  shared  s/w  library  (bring  your  laptops,  preloaded  with  Python  2.7)  

–  Genotyping    and  Sequencing  

–  Recombina*on,  phasing,  genome  mapping  

–  Popula*on  structure  and  coalescent  theory  

–  Selec*on,  evolu*on,  and  phylogene*c  trees  

–  Epigene*cs,  imprin*ng,  chroma*n  structure,  and  X  inac*va*on  

•  Part  2.  Paper  Presenta*ons  (1/3)  –  each  student  will  be  assigned  two  papers–  One  as  a  presenter,  the  second  as  a  discussant  

–  Haplotype  assembly  from  sequence  data  

–  De  novo  genome  and  transcriptome  assembly  

–  Sequence  archival,  compara*ve  genomic  analysis,  genomic  compression,  and  query  

–  Detec*ng  structural  varia*on  using  sequence  and  genotype  data  

–  Inferring  chroma*n  structure  using  sequence  and  methyla*on  data  

•  Part  3.  Project  Proposals  (1/3)  

1/13/14   Comp  790–  Introduc*on  &  Administriva   3  

Page 4: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Do  you  want  to  be  genotyped?  

•  I  have  10  donated  kits  and  will  get  more  if  needed  

•  Analyze  yourself  using  the  tools  we  develop  

•  Contribute  your  genotypes  if  you  want  others  to  use  

•  You  must  be  signed  up  (not  audi*ng)  to  be  genotyped  

•  Who  in  this  class  is  your  closest  rela*ve?  

1/13/14   Comp  665  –  Introduc*on  &  Signals   4  

Page 5: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

It’s  about  genes  

•  Gene*cs  is  most  clearly  understood  by  considering  its  subject  to  be  genes  rather  than  organisms  

•  Organisms  are  merely  vessels  for  assuring  the  survival  of  genes  

•  The  central  objec*ve  of  a  gene  is  to  propagate  itself  

•  Successful  genes  live  on  long  afer  their  host  organism    

1/13/14   Comp  790–  Introduc*on  &  Administriva   5  

“[Genes] that survived were the ones that built survival machines for themselves to live in. But making a living got steadily harder as new rivals arose with better and more effective survivial machines. Survival machines got bigger and more eloborate, and the process was cumulative and progressive…”

-- Dawkins, The Selfish Gene

Page 6: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Top-­‐down  and  boBom-­‐up  

•  Gene*cs  results  from  inheritance  

•  Ancestral  proper*es  can  be  inferred  from  extant  popula*ons  

1/13/14   Comp  790–  Introduc*on  &  Administriva   6  

Page 7: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Exercise  

•  Answer  the  following  –  How  many  dis*nct  Y  chromosomes  are  in  this

 pedigree  and  who  shares  them?  

–  How  many  dis*nct  Mitochondrial  DNAs  are  in  this  pedigree  and  who  shares  them?  

–  Explain  how  you  might  reconstruct  the  paternal  X  chromosome  of  siblings  B,  C,  and  D?  

–  What  frac*on  of  DNA  is  shared  between  G  and  H?  

–  What  frac*on  of  DNA  is  shared  by  H  and  I?  

–  Suppose  that  the  samples,  P,  A,  D,  G,  and  I  have  a  common  phenotype  not  present  in  the  remainder.  Which  two  samples  are  most  helpful  for  localizing  the  gene*c  component?  

1/13/14   Comp  665  –  Introduc*on  &  Signals   7  

P  

C  

A  

B   D   E  

G  

I  

F  

H  

Page 8: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

PopulaGon  geneGcs  

•  Popula*on  gene*cs  differs  from  classical  gene*cs  – Analysis  rather  than  synthesis  – Depends  on  models,  which  aUempt  to  explain  observa*ons  

– Less  emphasis  on  Darwin’s  natural  selec*on  

•  Considers  popula*on  dynamics  –  Isola*on  – BoUlenecks  

1/13/14   Comp  790–  Introduc*on  &  Administriva   8  

Page 9: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Why  computer  science  

•  Classically,  gene*cs,  both  top-­‐down  (classical)  and  boUom-­‐up  (popula*on)  have  focused  on  mathema*cal/sta*s*cal  models  

•  Access  to  gene*cs  data  has  recently  outpaced  our  capability  to  process,  interpret,  and  analyze  it  

•  As  model  complexity  increases,  it  becomes  harder  to  find  closed-­‐form  solu*ons  

•  Relies  more  and  more  on  computa*onal  modeling  to  infer  structure,  account  for  noise  etc.  

1/13/14   Comp  790–  Introduc*on  &  Administriva   9  

Page 10: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Wright-­‐Fisher  model  

•  One  of  the  first,  and  simplest  models  of  popula*on  genealogies  was  introduced  by  Wright  (1931)  and  Fisher  (1930).    

•  Model  emphasizes  transmission  of  genes  from  one  genera*on  to  the  next  

•  For  simplicity  we’ll  first  focus  on  a  fixed  popula*on  size,  each  with  a  dis*nct  gene  variant  

1/13/14   Comp  790–  Introduc*on  &  Administriva   10  

Page 11: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Simple  haploid  model  

•  Rules  –  Antecedent  genes  are  chosen  randomly,  with  replacement,  from  their  parental  genera*on  

–  No  selec*on  –  Fixed  popula*on  size  

1/13/14   Comp  790–  Introduc*on  &  Administriva   11  

G0: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

G1: ['J', 'A', 'H', 'B', 'I', 'E', 'D', 'G', 'A', 'B']

G2: ['A', 'J', 'E', 'G', 'D', 'E', 'B', 'I', 'A', 'A']

G3: ['A', 'A', 'E', 'J', 'I', 'A', 'I', 'A', 'J', 'B']

G4: ['E', 'A', 'B', 'B', 'A', 'E', 'A', 'A', 'A', 'A']

G5: ['A', 'A', 'B', 'A', 'A', 'E', 'A', 'A', 'A', 'B']

G6: ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'A', 'A']

G7: ['B', 'A', 'A', 'B', 'A', 'A', 'A', 'A', 'A', 'A']

What will this population eventually look like?

Page 12: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

AssumpGons  of  Wright/Fisher  

•  Discrete  and  non-­‐overlapping  genera*ons  •  Haploid  individuals  •  Popula*ons  size  is  constant  •  All  individuals  are  equally  fit  •  No  popula*on  of  social  structure  •  Genes  segregate  independently  

1/13/14   Comp  790–  Introduc*on  &  Administriva   12  

Page 13: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Some  Graphical  AbstracGons  

•  Replace  leUers  with  colors  

•  Draw  lineages  •  Sort  topologically  

1/13/14   13  Comp  790–  Introduc*on  &  Administriva  

Page 14: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Repeats  

1/13/14   Comp  790–  Introduc*on  &  Administriva   14  

Every population results in just one gene

Page 15: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Onset  of  uniformity  

•  10000  trials  •  Mode  =  11  (616)  

•  Mean  =  17.5  

1/13/14   Comp  790–  Introduc*on  &  Administriva   15  

Page 16: UNC Computational Systems Biology - Lecture01Lecture01.ppt Author Leonard McMillan Created Date 1/13/2014 5:23:51 PM

Next  Time  

•  Gene*cs  Background  –  Inbreeding  –  Coefficient  of  relatedness  

–  Diploidy  –  Recombina*on  

–  Phasing  •  Bring  your  laptops  loaded  with  Python  2.7  

– We  will  analyze  real  genotypes  

•  A  list  of  recent  papers  to  choose  from  

1/13/14   Comp  665  –  Introduc*on  &  Signals   16