© wiley publishing. 2007. all rights reserved. working with a single dna sequence

18
© Wiley Publishing. 2007. All Rights Reserved. Working with a Single DNA Sequence

Upload: adelia-horn

Post on 23-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

© Wiley Publishing. 2007. All Rights Reserved.

Working with a Single

DNA Sequence

Learning Objectives

Discover how to manipulate your DNA sequence on

a computer, analyze its composition, predict its

restriction map, and amplify it with PCR Find out about gene-prediction methods, their

potential, and their limitationsUnderstand how genomes and sequences and

assembled

Outline

1. Cleaning your DNA of contaminants

2. Digesting your DNA in the computer

3. Finding protein-coding genes in your DNA sequence

4. Assembling a genome

Cleaning DNA Sequences In order to sequence genomes, DNA sequences are often cloned in a

vector (plasmid, YAC, or cosmide) Sequences of the vector can be mixed with your DNA sequence Before working with your DNA sequence, you should always clean it

with VecScreen

Computing a Restriction Map

It is possible to cut DNA sequences using restriction enzymes Each type of restriction enzyme recognizes and cuts a different

sequence:• EcoR1: GAATTC• BamH1: GGATCC

There are more than 900 different restriction enzymes, each with a different specificity

The restriction map is the list of all potential cleavage sites in a DNA molecule

You can compile a restriction map with www.firtsmarket.com/cutter

Making PCR with a Computer

Polymerase Chain Reaction (PCR) is a method for amplifying DNA PCR is used for many applications, including

• Gene cloning• Forensic analysis• Paternity tests

PCR amplifies the DNA between two anchors These anchors are called the PCR primer

Designing PCR Primers PCR primes are typically 20 nucleotides long The primers must hybridize well with the DNA On biotools.umassmed.edu, find the best location for the

primers: • Most stable• Longest extension

Analyzing DNA Composition

DNA composition varies a lotStability of a DNA sequence depends on its G+C

content (total guanine and cytosine)High G+C makes very stable DNA moleculesOnline resources are available to measure the GC

content of your DNA sequence

Predicting Genes

The most important analysis carried out on DNA sequences is gene prediction

Gene prediction requires different methods for eukaryotes and prokaryotes

Most gene-prediction methods use hidden Markov Models

Predicting Genes in Prokaryotic Genome

In prokaryotes, protein-coding genes are

uninterrupted• No introns

Predicting protein-coding genes in prokaryotes is

considered a solved problem• You can expect 99% accuracy

Finding Prokaryotic Genes with GeneMark

GeneMark is the state of the art

for microbial genomes GeneMark can

• Find short proteins• Resolve overlapping genes• Identify the best start codon

GeneMark uses hidden Markov

Models Use exon.gatech.edu/GeneMark

Predicting Eukaryotic Genes

Eukaryotic genes (human, for example) are very hard to predict

Precise and accurate eukaryotic gene prediction is still an open problem• ENSEMBL contains 21,662 genes for the human genome• There may well be more genes than that in the genome, as yet unpredicted

You can expect 70% accuracy on the human genome with automatic methods

Experimental information is still needed to predict eukaryotic genes

Finding Eukaryotic Genes with GenomeScan

GenomeScan is the state of the art for eukaryotic genes

GenomeScan works best with• Long exons• Genes with a low GC content

GenomeScan uses • Hidden Markov Models• Homology searches

It can incorporate experimental information

Use genes.mit.edu/genomescan

Producing Genomic Data

Until recently, sequencing an entire genome was very expensive and difficult

Only major institutes could do itToday, scientists estimate that in 10 years, it will cost about

$1000 to sequence a human genomeWith sequencing so cheap, assembling your own genomes is

becoming an optionHow could you do it?

Sequencing and Assembling a Genome (I)

To sequence a genome, the first task is to cut it into

many small, overlapping piecesThen clone each piece

Sequencing and Assembling a Genome (II)

Each piece must be sequenced Sequencing machines cannot do an entire sequence at once

• They can only produce short sequences smaller than 1 Kb• These pieces are called reads

It is necessary to assemble the reads into contigs

Sequencing and Assembling a Genome (III)

The most popular program for assembling reads is PHRAP • Available at www.phrap.org

Other programs exist for joining smaller datasets• For example, try CAP3 at pbil.univ-lyon1.fr/cap3.php

Going Farther

Predicting when and how genes are expressed is one of the main challenges of modern biology• It requires predicting genes• It also requires predicting promoters

The challenge is to find these regions and to understand the signals they contain

Try the following resources:• Zhang Lab rulai.cshl.edu• EPD www.epd.isb-sib.ch