human genome project

62
HUMAN GENOME PROJECT HUMAN GENOME PROJECT MS.RUCHI YADAV LECTURER AMITY INSTITUTE OF BIOTECHNOLOGY AMITY UNIVERSITY LUCKNOW(UP)

Upload: ruchibioinfo

Post on 09-Jun-2015

636 views

Category:

Education


1 download

DESCRIPTION

hgpsequencing

TRANSCRIPT

Page 1: Human genome project

HUMAN GENOME PROJECTHUMAN GENOME PROJECTMS.RUCHI YADAV

LECTURERAMITY INSTITUTE OF

BIOTECHNOLOGYAMITY UNIVERSITY

LUCKNOW(UP)

Page 2: Human genome project

HUMAN GENOME PROJECTHUMAN GENOME PROJECT GENOME SEQUENCING GENOME ASSEMBLY GENOME ANNOTATION

Page 3: Human genome project

Human Genome Project Human Genome Project BackgroundBackgroundThe idea of sequencing the entire human

genome was First proposed in discussions at scientific meetings organized by the US Department of Energy and others from 1984 to 1986

Recommended a broader programme, to include:

The creation of genetic, physical and sequence maps of the human genome;

Parallel efforts in key model organisms such as bacteria, yeast, worms, fies and mice;

Development of technology in support of these objectives;

Research into the ethical, legal and social issues raised by human genome research.

Page 4: Human genome project

HGP BACKGROUND……HGP BACKGROUND……Human Genome Organization (HUGO) &

International Human Genome Sequencing Consortium (IHGSC) was founded to provide a forum for international coordination of genomic research

HGP Project is constituted as the National Human Genome Research Initiative (NHGRI).

 The collaboration was coordinated through periodic international meetings (referred to as ‘Bermuda meetings’)

Work was shared flexibly among the centres, with some groups focusing on particular chromosomes and others contributing in a genome-wide fashion.

The second principle was rapid and unrestricted data release. The centres adopted a policy that all genomic sequence data should be made publicly available without restriction within 24 hours of assembly (Bermuda Principle)

Page 5: Human genome project

Human Genome ProjectBegun formally in 1990, the U.S. Human

Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003. Project goals were to :-

Identify all the approximately 20,000-25,000 genes in human DNA,

Determine the sequences of the 3 billion chemical base pairs that make up human DNA,

Store this information in databases, Improve tools for data analysis, Transfer related technologies to the private

sector, and Address the ethical, legal, and social issues

(ELSI) that may arise from the project.

Page 6: Human genome project
Page 7: Human genome project
Page 8: Human genome project

Milestones::June 2000: Completion of a working draft of

the entire human genome February 2001: Analyses of the working

draft are publishedApril 2003: HGP sequencing is completed

and Project is declared finished two years ahead of schedule

Page 9: Human genome project

Timeline of large-scale genomic analyses.

Page 10: Human genome project

HUMAN GENOMEThe human genome contains 3 billion chemical

nucleotide bases (A, C, T, and G). The average gene consists of 3000 bases, but sizes

vary greatly, with the largest known human gene being dystrophin at 2.4 million bases.

The total number of genes is estimated at around 30,000 much lower than previous estimates of 80,000 to 140,000.

 Almost all (99.9%) nucleotide bases are exactly the same in all people.

 The functions are unknown for over 50% of discovered genes.

Page 11: Human genome project

HUMAN GENOME PROJECTHUMAN GENOME PROJECT

PUBLIC AND PRIVATE SECTOR

Page 12: Human genome project

Two Different Groups Worked to Obtain the DNA Sequence of the Human Genome

The US HGP is a multinational consortium established by government research agencies and funded publicly.

Celera Genomics is a private company whose former CEO, J. Craig Venter and Francis collins, ran an independent sequencing project.

Differences arose regarding who should receive the credit for this scientific milestone.

June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome.

Page 13: Human genome project

PUBLISHEDThe International Human Genome

Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001.

“Initial Sequencing and Analysis of the Human Genome”

Celera Genomics published their results in Science, Vol 291(5507): 1304-1351, 2001.

“The Sequence of the Human Genome”

Page 14: Human genome project

HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIESLARGE SCALE SEQUENCING TECHNOLOGY

Page 15: Human genome project

Genome GlossaryGenome Glossary

Page 16: Human genome project

Genome GlossaryGenome Glossary

Page 17: Human genome project

Genome GlossaryGenome Glossary

Page 18: Human genome project

HGP SEQUENCING HGP SEQUENCING STRATEGIESSTRATEGIES

The HGP project had three stages:

Genetic (or linkage) mappingPhysical mappingDNA sequencing

Page 19: Human genome project

Three-Stage Approach to Three-Stage Approach to Genome SequencingGenome Sequencing

Page 20: Human genome project

Strategic IssuesStrategic IssuesThere are two approaches for

sequencing large repeat-rich genomes.

First is a whole-genome shotgun sequencing approach, as has been used for the repeat-poor genomes of viruses, bacteria and flies, using linking information and computational

Second is the ‘hierarchical shotgun sequencing’ approach , also referred to as `map-based', `BAC-based' or `clone-by-clone'

Page 21: Human genome project

‘‘HIERARCHICAL SHOTGUN SEQUENCING’HIERARCHICAL SHOTGUN SEQUENCING’`MAP-BASED', `BAC-BASED' OR

`CLONE-BY-CLONE'

Technology for large-scale sequencing

US HGP

Page 22: Human genome project

Hierarchical shotgun Hierarchical shotgun sequencingsequencing

Page 23: Human genome project

Clone-by-clone or hierarchicalClone-by-clone or hierarchicalsequencing strategysequencing strategy

Advantages:Ability to fill gap and re-sequence the

uncertain regions.Ability to distribute the clones to

other labsAbility to check the produced

sequence by restriction enzymesDisadvantages:Expensive and time-consuming for

construction of the physical mapExperienced personnel are required,

Page 24: Human genome project

HIERARCHIAL ASSEMBLY OF SEQUENCE CONTIG SCAFFOLD

Page 25: Human genome project

Assembly of the draft genome Assembly of the draft genome sequencesequence

The key steps in assembling individual sequenced clones into the draft genome sequence.

Page 26: Human genome project

Levels of clone and sequence Levels of clone and sequence coverage.coverage.

Page 27: Human genome project

WHOLE-GENOME SHOTGUNWHOLE-GENOME SHOTGUN

Developed by J. Craig Venter

Page 28: Human genome project

Whole-Genome Shotgun Approach to Genome Sequencing

The whole-genome shotgun approach was developed by J. Craig Venter in 1992.

This approach skips genetic and physical mapping and sequences random DNA fragments directly.

Powerful computer programs are used to order fragments into a continuous sequence.

Page 29: Human genome project

Whole-Genome Shotgun Sequencing

Page 30: Human genome project

Shotgun Sequencing Strategy

Advantage: No physical map construction, Less risk of recombinant clones, Cost effective and fast. Ideal for small genome sequencingDisadvantage: Difficult to fill gaps and Re-track all the sequenced plasmids, Data less useful for positional cloning

Page 31: Human genome project

Whole-Genome AssemblyWhole-Genome Assembly

Page 32: Human genome project

Hierarchical vs. Shotgun Sequencing

Page 33: Human genome project

Assembly of a mapped scaffold

Page 34: Human genome project

Generating the draft genome sequence

Generating a draft sequence of the human

genome involved three steps: Selecting the BAC clones to be

sequenced,Sequencing them ,andAssembling the individual

sequenced clones into an overall draft genome sequence.

Page 35: Human genome project

Assembly of the draft genome sequence

This process involved three steps:Filtering,Layout and Merging.The entire data set was filtered

uniformlyto eliminate contamination from

nonhumansequences and other artefacts that had

notalready been removed by the individualcentres.

Page 36: Human genome project

Assembly of the draft genome sequenceThe sequenced clones were then

associated with specific clones on the physical map to produce a `layout'.

The fingerprint clone contigs were then mapped to chromosomal locations, using sequence matches to mapped STSs from four human maps; radiation hybrid maps, one YAC and two genetic maps together with data from FISH

Page 37: Human genome project

The human genome assembly and annotation process

•BUILD CYCLE•DATA FREEZE•RELEASE

Page 38: Human genome project

The human genome assembly and annotation process : INPUTS

Page 39: Human genome project
Page 40: Human genome project
Page 41: Human genome project

Genome AnnotationGenome AnnotationFeature Annotation

◦Clone Features◦STS Features◦SNP Features◦Gene, mRNA(transcript), ◦misc_RNA(pseudogenes , and non-

coding transcripts, ) ◦Protein Features◦Repeat features

Page 42: Human genome project

Genome AnnotationGenome AnnotationProducts

◦Sequence Data◦Resource Support( dbSNP , Entrez

Gene, Map Viewer, UniSTS)Data Access

◦BLAST◦Entrez Retrieval(Accession number,

gene symbol, or protein name)◦FTP(genomes FTP site)

Page 43: Human genome project

Links from Map Viewer objects to other NCBI resources

Page 44: Human genome project

UCSC put the human genome sequence on the web July 7, 2000

UCSC put the human genome sequence on CD in October 2000, with varying results

Page 45: Human genome project

HGP ON WEBHGP ON WEBGenome Browsers were developed and are

maintained by the University of California at Santa Cruz (UCSC) .

EnsEMBL project of the European Bioinformatics Institute and the Sanger Centre Additional browsers have been created;

URLs are listed at www.nhgri.nih.gov/genome_hub.

These web-based computer tools allow users to view an annotated display of the draft genome sequence, with the ability to scroll along the chromosomes and zoom in or out to different scales.

In addition to using the Genome Browsers, one can download from these sites the entire draft genome sequence together with the annotations in a computer-readable format.

Page 46: Human genome project

UCSC GENOME BROWSERUCSC GENOME BROWSER

Page 47: Human genome project
Page 48: Human genome project
Page 49: Human genome project

Broad genomic landscapeBroad genomic landscapeThe distribution of GC content, CpG islandsRecombination rates, Repeat content andGene content of the human

genome.

Page 50: Human genome project

Long-range variation in GC Long-range variation in GC contentcontent

GC-rich and GC-poor regions may have different biological properties:

Gene density, Composition of repeat sequences,

correspondence with cytogenetic bands

Recombination rateCpG islands are of particular Interest

because they are associated with the 5’ends of genes

Page 51: Human genome project

Repeat content of the human Repeat content of the human genomegenome

Page 52: Human genome project

INTERSPERSED REPEATSINTERSPERSED REPEATS

Page 53: Human genome project

Gene content of the human Gene content of the human genomegenomeRNA genes andprotein-coding genes in the human

genome.Noncoding RNAs

Page 54: Human genome project

There are several major classes of ncRNA

tRNA rRNAs small nucleolar RNAs (snoRNAs) aresmall nuclear RNAs (snRNAs) are critical

components of spliceosomes, the large ribonucleoprotein (RNP) complexes that splice introns out of pre-mRNAs in the nucleus.

ncRNAs do not have translated ORFs, are often small and are not polyadenylated.

Page 55: Human genome project

  Software tools for ab initio gene prediction

Page 56: Human genome project

  Software tools for ab initio gene prediction

Page 57: Human genome project

Distribution of the Distribution of the homologues of the predicted homologues of the predicted human proteins.human proteins.

Page 58: Human genome project

Conserved Conserved segments in segments in the human the human and mouse and mouse genome.genome. * * Each colour corresponds to a particular mousechromosome.

Page 59: Human genome project

DISEASE GENESDISEASE GENES

Page 60: Human genome project

DRUG TARGETSDRUG TARGETS

Page 61: Human genome project

Research challenges in genetics--what we still don't know, even with the full human DNA sequence in hand.

Gene number, exact locations, and functions ,Gene regulation DNA sequence organization ,Chromosomal structure and

organization Noncoding DNA types, amount, distribution, information content,

and functions Coordination of gene expression, protein synthesis, and post-

translational events Interaction of proteins in complex molecular machines Predicted vs. experimentally determined gene function Evolutionary conservation among organisms ,Protein

conservation (structure and function) Proteomes in organisms Correlation of SNPs with health and disease Disease-susceptibility prediction based on gene sequence

variation Genes involved in complex traits and multigene diseases Complex systems biology, including microbial consortia useful for

environmental restoration Developmental genetics, genomics

Page 62: Human genome project

“The more we learn about the human genome, the more there

is to explore”

“We shall not cease from exploration. And the end of all our exploring will be to arrive where we started, and

know the place for the first time.” T. S. Eliot