lecture’3:’biology’basics’con4nued’cs425/spring17/slides/lecture_3...– human’3.2gbp’...

30
Lecture 3: Biology Basics Con4nued Spring 2017 January 24, 2017

Upload: vanduong

Post on 19-Apr-2018

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Lecture  3:  Biology  Basics  Con4nued  

Spring  2017  January  24,  2017  

Page 2: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Genotype/Phenotype  Phenotype:    

Blue  eyes   Brown  eyes  

Genotype:    

Recessive:  bb   Dominant:  Bb  or  BB  

Page 3: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

•  Genes  are  shown  in  rela%ve  order  and  distance  from  each  other  based  on  pedigree  studies.  

•  The  chance  of  the  chromosome  breaking  between  A  &  C  is  higher  than  the  chance  of  the  chromosome  breaking  between  A  &  B  during  meiosis  

•  Similarly,  the  chance  of  the  chromosome  breaking  between  E  &  F  is  higher  than  the  chance  of  the  chromosome  breaking  between  F  &  G  

•  The  closer  two  genes  are,  the  more  likely  they  are  to  be  inherited  together  (co-­‐occurrence)  

•  If  pedigree  studies  show  a  high  incidence  of  co-­‐occurrence,  those  genes  will  be  located  close  together  on  a  gene4c  map  

Page 4: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

•  Pleiotropy:  when  one  gene  affects  many  different  traits.  

•  Polygenic  traits:  when  one  trait  is  governed  by  mul4ple  genes,  which  maybe  on  the  same  chromosome  or  on  different  chromosomes.    – The  addi4ve  effects  of  numerous  genes  on  a  single  phenotype  create  a  con4nuum  of  possible  outcomes.    

– Polygenic  traits  are  also  most  suscep4ble  to  environmental  influences.    

Page 5: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Pleiotropy  in  humans:  Phenylketonuria    A  disorder  that  is  caused  by  a  deficiency  of  the  enzyme  phenylalanine  hydroxylase,  which  is  necessary  to  convert  the  essen4al  amino  acid  phenylalanine  to  tyrosine.    A  defect  in  the  single  gene  that  codes  for  this  enzyme  therefore  results  in  the  mul4ple  phenotypes  associated  with  PKU,  including  mental  retarda4on,  eczema,  and  pigment  defects  that  make  affected  individuals  lighter  skinned    

Page 6: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Polygenic  Inheritance  in  Humans  •  Height  is  controlled  by  polygenes  for  skeleton  height,  but  their  

effect  may  be  affected  by  malnutri4on,  injury,  and  disease.  •  Weight,  skin  color,  and  intelligence.  •  Birth  defects  like  clubfoot,  cle_  palate,  or  neural  tube  defects  are  

also  the  result  of  mul4ple  gene  interac4ons.  •  Complex  diseases  and  traits  have  a  tendency  to  have  low  

heritability  (tendency  to  be  inherited)  compared  to  single  gene  disorders  (i.e.  sickle-­‐cell  anemia,  cys4c  fibrosis,  PKU,  Hemophelia,  many  extremely  rare  gene4c  disorders).  

 

Page 7: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Selec4on  •  Some  genes  may  be  subject  to  selec%on,  where  individuals  with  advantages  or  “adap4ve”  traits  tend  to  be  more  successful  than  their  peers  reproduc4vely.  

•  When  these  traits  have  a  gene4c  basis,  selec4on  can  increase  the  prevalence  of  those  traits,  because  the  offspring  will  inherit  those  traits.  This  may  correlate  with  the  organism's  ability  to  survive  in  its  environment.  

•  Several  different  genotypes  (and  possibly  phenotypes)  may  then  coexist  in  a  popula4on.  In  this  case,  their  gene4c  differences  are  called  polymorphisms.  

Page 8: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Gene4c  Muta4on  •  The  simplest  is  the  point  muta4on  or  subs4tu4on;  here,  a  single  

nucleo4de  in  the  genome  is  changed  (single  nucleo%de  polymorphisms  (SNPs))  

•  Other  types  of  muta4ons  include  the  following:  –  Inser%on.  A  piece  of  DNA  is  inserted  into  the  genome  at  a  certain  posi4on  

–  Dele%on.  A  piece  of  DNA  is  cut  from  the  genome  at  a  certain  posi4on  

–  Inversion.  A  piece  of  DNA  is  cut,  flipped  around  and  then  re-­‐inserted,  thereby  conver4ng  it  into  its  complement  

–  Transloca%on.  A  piece  of  DNA  is  moved  to  a  different  posi4on.  –  Duplica%on.  A  copy  of  a  piece  of  DNA  is  inserted  into  the  genome  

Page 9: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Muta4ons  and  Selec4on  

•  While  muta4ons  can  be  detrimental  to  the  affected  individual,  they  can  also,  in  rare  cases,  be  beneficial;  more  frequently,  neutral.  

•  O_en  muta4ons  have  no  or  negligible  impact  on  survival  and  reproduc4on.  

•  Thereby  muta4ons  can  increase  the  gene%c  diversity  of  a  popula4on,  that  is,  the  number  of  present  polymorphisms.    

•  In  combina4on  with  selec4on,  this  allow  a  species  to  adapt  to  changing  environmental  condi4ons  and  to  survive  in  the  long  term.  

Page 10: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Raw  Sequence  Data  •  4  bases:  A,  C,  G,  T  +  other  (i.e.  N  =  any,  etc.)  

– kb  (=  kbp)  =  kilo  base  pairs  =  1,000  bp  – Mb  =  mega  base  pairs  =  1,000,000  bp    – Gb  =  giga  base  pairs  =  1,000,000,000  bp.  

•  Size:    – E.  Coli  4.6Mbp  (4,600,000)  – Fish  130  Gbp  (130,000,000,000)  – Paris  japonica  (Plant)  150  Gbp  – Human  3.2Gbp      

Page 11: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Fasta  File  •  A  sequence  in  FASTA  format  begins  with  a  single-­‐line  

descrip4on,  followed  by  lines  of  sequence  data  (file  extension  is  .fa).    

•  It  is  recommended  that  all  lines  of  text  be  shorter  than  80  characters  in  length.  

Page 12: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Fastq  File  •  Typically  contain  4  lines:  

–  Line  1  begins  with  a  '@'  character  and  is  followed  by  a  sequence  iden4fier  and  an  op#onal  descrip4on.  

–  Line  2  is  the  sequence.  –  Line  3  is  the  delimiter  ‘+’,  with  an  op4onal  descrip4on.  –  Line  4  is  the  quality  score.  –  file  extension  is  .fq  

@SEQ_IDGATTTGGGGTTCAAAGCTTCAAAGCTTCAAAGC+!''*((((***+))%%%++++++++!!!++***

Page 13: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Central  Dogma  

Page 14: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Discovery  of  DNA  •  DNA Sequences

–  Chargaff and Vischer, 1949 •  DNA consisting of A, T, G, C

–  Adenine, Guanine, Cytosine, Thymine –  Chargaff Rule

•  Noticing #A≈#T and #G≈#C –  A “strange but possibly meaningless”

phenomenon. •  Wow!! A Double Helix

–  Watson and Crick, Nature, April 25, 1953 – 

–  Rich, 1973 •  Structural biologist at MIT. •  DNA’s structure in atomic resolution.

Crick Watson

1 Biologist 1 Physics Ph.D. Student 900 words Nobel Prize

Page 15: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Watson  &  Crick  –  “…the  secret  of  life”  •  Watson: a zoologist, Crick: a physicist •  “In 1947 Crick knew no biology and

practically no organic chemistry or crystallography..” – www.nobel.se

•  Applying Chagraff’s rules and the X-ray

image from Rosalind Franklin, they constructed a “tinkertoy” model showing the double helix.

•  Their 1953 Nature paper: “It has not

escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”

Watson & Crick with DNA model

Rosalind Franklin with X-ray image of DNA

Page 16: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Superstructure  

Lodish et al. Molecular Biology of the Cell (5th ed.). W.H. Freeman & Co., 2003.

Page 17: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Superstructure  implica4ons  •  DNA in a living cell is in a highly compacted and

structured state. •  Transcription factors and RNA polymerase need

ACCESS to do their work. •  Transcription is dependent on the structural state

– SEQUENCE alone does not tell the whole story.

Page 18: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

RNA  •  RNA is similar to DNA chemically. It is usually only

a single strand. T(hyamine) is replaced by U(racil)

•  RNA can form secondary structures by “pairing up”

http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif tRNA linear and 3D view:

Page 19: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

RNA,  con4nued •  Several types exist, classified by function •  mRNA – carries a gene’s message out of

the nucleus. •  tRNA – transfers genetic information from

mRNA to an amino acid sequence •  rRNA – ribosomal RNA. Part of the

ribosome machine.

Page 20: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Protein  

•  A polymer composed of amino acids. •  There are 20 naturally occurring amino

acids.

•  Usually functions through molecular motion or binding with other molecules.

Page 21: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Proteins:  Primary  Structure  

•  Pep4de  sequence:  – Sequence  of  amino  acids  =  sequences  from  a  20  leqer  alphabet  (i.e.  ACDEFGHIKLMNPQRSTVWY)

– Average  protein  has  ~300  amino  acids  – Typically  stored  as  fasta  files  

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGXIENY

Page 22: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Naturally  Occurring  Amino  Acids  

Page 23: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Proteins:  Secondary  Structure  

•  Polypep4de  chains  fold  into  regular  local  structures  – Common  types:  alpha  helix,  beta  sheet,  turn,  loop  – Defined  by  the  crea4on  of  hydrogen  bonds  

Page 24: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Proteins:  Ter4ary  Structure  

•  3D  structure  of  a  polypep4de  sequence  –  interac4ons  between  non-­‐local  and    foreign  atoms  

Page 25: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Proteins:  Quaternary  Structure  

•  Arrangement  of  protein  subunits  

Page 26: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Conclusions  

Page 27: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Challenges  in  Bioinforma4cs  

•  Need  to  feel  comfortable  in  interdisciplinary  area  

•  Depend  on  others  for  primary  data  •  Need  to  address  important  biological  and  computer  science  problems  

Page 28: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Basic  Steps  in  Bioinforma4cs  Research  

1.  Data  management  problem:  storage,  transfer,  transforma4on  (Informa4on  Technology)  

2.  Data  analysis  problem:  mapping,  assembly  –  algorithm  scaling  (Computer  Science)  

3.  Sta4s4cal  challenges:  tradi4onal  sta4s4cs  is  not  well  suited  for  modeling  systema4c  errors  over  large  number  of  observa4ons  (Biosta4s4cs)  

4.  Biological  hypothesis  tes4ng  –  data  interpreta4on  (Life  Science)  

Page 29: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Basic  Skills    

•  Ar4ficial  intelligence  and  machine  learning  •  Sta4s4cs  and  probability  •  Algorithms  •  Databases  •  Programming  •  Biology/Chemistry  knowledge  

Page 30: Lecture’3:’Biology’Basics’Con4nued’cs425/spring17/slides/Lecture_3...– Human’3.2Gbp’ ’ ’ FastaFile’ • A’sequence’in’FASTA’formatbegins’with’asingleRline’

Genomics:  -­‐  Assembly    -­‐  Detec4on  of  varia4on  -­‐  GWAS  

RNA:  -­‐  Gene  expression  -­‐  Transcriptome  assembly    -­‐  Pathway  analysis  -­‐  RNA-­‐RNA  interac4on  

Protein:  -­‐  Mass  spectrometry  -­‐  Structure  predic4on    -­‐  Protein-­‐Protein  

interac4on