introduction to genomes

26
Introduction to genomes Content the human genome CNVs SNPs Alternative splicing genome projects Celia van Gelder CMBI UMC Radboud June 2009 [email protected]

Upload: zeke

Post on 19-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Introduction to genomes. Content the human genome CNVs SNPs Alternative splicing genome projects Celia van Gelder CMBI UMC Radboud June 2009 [email protected]. The human genome. Genome: the entire sequence of DNA in a cell 3 billion basepairs (3Gb) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to genomes

Introduction to genomes

Content

the human genome CNVs SNPs Alternative splicing

genome projectsCelia van Gelder

CMBIUMC Radboud

June [email protected]

Page 2: Introduction to genomes

The human genome

• Genome: the entire sequence of DNA in a cell

• 3 billion basepairs (3Gb)

• 22 chromosome pairs + X en Y chromosomes

• Chromosome length varies from ~50Mb to ~250Mb

• About 22000 protein-coding genes

• Human genome is 99.9% identical among individuals

Page 3: Introduction to genomes

Eukaryotic Genomes: more than collections of genes

• Protein coding genes

• RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA)

• Structural DNA (centromeres, telomeres)

• Regulation-related sequences (promoters, enhancers, silencers, insulators)

• Parasite sequences (transposons)

• Pseudogenes (non-functional gene-like sequences)

• Simple sequence repeats

Page 4: Introduction to genomes

Annotating the genome

• Genome annotation is the process of attaching biological information to sequences. It consists of two main steps:

1. identifying elements on the genome, a process called Gene Finding,

and

2. attaching biological information to these elements.

• Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.

Page 5: Introduction to genomes

The human genome cntnd

From: Molecular Biology of the Cell

(4th edition) (Alberts et al., 2002)

• Only 1.2% codes for proteins, 3.5-5% is under selection

• Long introns, short exons

• Large spaces between genes

• More than half consists of repetitive DNA

Page 6: Introduction to genomes

Eukaryotic Genomes: High fraction non-coding DNA

Blue: ProkaryotesBlack: Unicellular eukaryotesOther colors: Multicellular eukaryotes (red = vertebrates)

From: Mattick, NRG, 2004

Page 7: Introduction to genomes

Variation along genome sequence

• Nucleotide usage varies along chromosomes

– Protein coding regions tend to have high GC levels

• Genes are not equally distributed across the chromosomes

– Housekeeping generally in gene-dense areas

– Gene-poor areas tend to have many tissue specific genes

From: Ensembl

Page 8: Introduction to genomes

Chromosome organisation (1)

From: Lodish (4th edition)

Page 9: Introduction to genomes

Chromosome organisation (2)

From: Lodish (4th edition)

• DNA packed in chromatin

• Non-active genes often in densely packed chromatin (30-nm fiber)

• Active genes in less dense chromatin (beads-on-a-string)

• Gene regulation by changing chromatin density, methylation/acetylation of the histones

Genes that are OFF

Genes that are ON

Page 10: Introduction to genomes

Today’s focus

1. Copy number variations (CNV)

2. Single Nucleotide Polymorphisms (SNPs)

3. Alternative transcripts

Page 11: Introduction to genomes

Copy Number Variation

• People do not only vary at the nucleotide level (SNPs)

• Copy Number Variations (CNVs):duplications and deletions of pieces of chromosome

• When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals

• CNVs may either be inherited or caused by de novo mutation

Page 12: Introduction to genomes

Why study CNVs?

• CNVs are common in cancer and other diseases.

• CNVs are also common in normal individuals and contribute to our uniqueness. These changes can also influence the susceptibility to disease.

• Since CNVs often encompass genes, they can have important roles both in characterizing human disease and discovering drug response targets.

• Understanding the mechanisms of CNV formation may also help us better understand human genome evolution.

Page 13: Introduction to genomes

CNV & disease, examples

CNVs have been implicated in

• CancerEGFR higher copy number in non-small cell lung cancer

• Low copy number of FCGR3B can increase susceptibility to SLE & other autoimmune disorders

• Autism

• Schizophrenia (dept. human genetics)

• Mental retardation (dept. human genetics)

Page 14: Introduction to genomes

Single Nucleotide Polymorphisms (SNPs)

• SNPs are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered.

• Similar to mutations, but are simultaneously present in the population, and generally have little effect

• Are being used as genetic markers (a genetic disease is e.g. associated with a SNP)

T

T

T

T

T

T A

A

A

A

A

C

C

C G

G

G

G

A

T

T

T

T

T

T A

A

A

A

A

C

C

C G

G

G

G

A

CGG C TA

Single Nucleotide- Polymorphism

(SNP)

Page 15: Introduction to genomes

SNP fact sheet

• For a variation to be considered a SNP, it must occur in at least 1% of the population.

• SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome.

• Two of every three SNPs involve the replacement of cytosine (C) with thymine (T).

• SNPs can occur in coding (gene) and non coding regions of the genome.

Page 16: Introduction to genomes

SNPs & medicine

• Although more than 99% of human DNA sequences are the same, variations in DNA sequence can have a major impact on how humans respond to:– disease; – environmental factors such as bacteria, viruses, toxins, and

chemicals; – and drugs and other therapies.

• This makes SNPs valuable for biomedical research and for developing pharmaceutical products or medical diagnostics.

• SNPs are also evolutionarily stable—not changing much from generation to generation—making them easier to follow in population studies.

Page 17: Introduction to genomes

SNP & disease, example

Alzheimer's disease & apolipoprotein E

• ApoE contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4.

• Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid.

• Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE.

• Research has shown that a person who inherits at least one E4 allele will have a greater chance of developing Alzheimer's disease.

Page 18: Introduction to genomes

• The HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings.

• Using HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors.

• HapMap is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States

• All of the information generated will be released into the public domain.

• www.hapmap.org

HapMap

Page 19: Introduction to genomes

Alternative splicing

Page 20: Introduction to genomes

Alternative splicing (2)

~ 15 % of the mutations that cause genetic diseases affect pre-mRNA splicing~ 15 % of the mutations that cause genetic diseases affect pre-mRNA splicing

Page 21: Introduction to genomes

Genome projects, a bit of history

http://www.genomesonline.org/

Page 22: Introduction to genomes
Page 23: Introduction to genomes

Sequenced genomes

• 1995 Haemophilus influenzae 1.8 Mb• 1996 Yeast 12 Mb• 1998 C. elegans 100 Mb• 1999 Fruit fly 125 Mb• 2000 Arabidopsis 115 Mb• 2001 Human (draft)• 2002 Mouse 2.6 Gb• 2002 Rice• 2004 Human (“finished”) 3 Gb• 2006 Sea urchin• 2007 Grapevine• 2008 Platypus (draft)• 2009 Cow

Page 24: Introduction to genomes

Some genome sizes

Organism Genome size (base pairs)

Virus, Phage Φ-X174; 5387 First sequenced genomeVirus, Phage λ 5×104

Bacterium, Escherichia coli 4×106

Plant, Fritillary assyrica 13×1010 Largest known genomeFungus,Saccharomyces cerevisiae 2×107

Nematode, Caenorhabditis elegans 8×107

Insect, Drosophila melanogaster 2×108

Mammal, Homo sapiens 3×109

Page 25: Introduction to genomes

Genome browsers can be used to examine ….

–Genomic sequence conservation

–Duplications en deletions of pieces chromosome (Copy Number Variations, CNVs)

–Single Nucleotide Polymorphisms (SNPs)

–Alternative splicing

–And much more….

LET’S GO BROWSE GENOMES!

Page 26: Introduction to genomes

Alternative Transcripts

Source: Wikipedia (http://www.wikipedia.org/)