a gentle introduction to ucsc genome browser 陳任志, 游岳齊

Post on 26-Dec-2015

258 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Gentle Introduction to UCSC A Gentle Introduction to UCSC Genome BrowserGenome Browser

陳任志 , 游岳齊

OptionsOptions

I. Genome Browser II. ENCODE III. Blat IV. Table Browser V. Gene Sorter VI. In Silico PCR VII. Proteome Browser VIII. Utilities IX. Downloads

I. Genome BrowserI. Genome Browser

Human (Homo sapiens) Genome Browser Gateway

Provides any section of entire human genome Non-Standard Join Certificates

– some sequence joins between adjacent clones in this assembly could not be computationally validated

the sequencing center responsible for the particular chromosome provides an electronic certificate

– should state why the submitter thinks the join is valid

QueryQuery

Clade: 具有相同祖先的一群生物

vertebrate:脊椎動物

deuterostome:後口類

insect:昆蟲

nematode:線蟲

Chimp:黑猩猩

Rhesus:恆河猴

Opossum:負鼠

X. tropicalis:蛙

Tetraodon:河豚

Fugu:河豚

Assembly date Display image width

Entire chromosome– chr7 (all of chromosome 7)

Cytological band– 20p13 (region for band p13 on chr 20)

Chromosomal coordinate range– chr3:1-1000000 (first million bases of chr 3, counting from p arm t

elomere) mRNA, EST, or STS marker Keywords from the GenBank description of an mRNA (huntington)

Search ResultSearch ResultPosition

zoom in/out

Restriction EnzymemRNA

ConservationSNPs

Display option Display option

II. ENCODEII. ENCODE

Stands for “Encyclopedia Of DNA Elements” Public research consortium to carry out a project

to identify all functional elements in the human genome sequence

Launched by The National Human Genome Research Institute (NHGRI)

Conducted in three phases:– pilot project phase (survey existing methods)– technology development phase (develop new methods)– planned production phase (…)

ENCODE FormatsENCODE Formats

Browser Extensible Data Format (BED)– for efficient access to genomic annotations

General Feature Format (GFF)– for data where there are a set of linked features

Gene Transfer Format (GTF)– a refinement of GFF that tightens the specification

Multiple Alignment Format (MAF)– a series of multiple alignments in one format

Wiggle Format (WIG)– for continuous-valued data in track format

ENCODE OptionsENCODE Options

Regions (hg16)– old database (+mRNA, EST, & STS markers)

Regions (hg17)– new database (+mRNA, EST, & STS markers)

Data Status– the current status of ENCODE datasets

Downloads– sequence and annotation data downloads

Submission– for the submission of ENCODE-related data

ENCODE Query+ResultsENCODE Query+Results

ENCODE Details hg16ENCODE Details hg16

ENCODE Details hg17ENCODE Details hg17

III. BlatIII. Blat

To quickly find sequences of 95% and greater similarity of length 40 bases or more

BLAST-Like Alignment Tool, not BLASTUse: Paste in a query sequence to find its

location in the the genometakes up just under 1 GB of RAM

Blat QueryBlat Query

Query sequence

Upload file

Blat ResultsBlat Results

Browser viewDetail view

IV. Table BrowserIV. Table Browser

To get the data associated with a track in text format, to calculate intersections between tracks, and to retrieve DNA sequence covered by a track

Table Browser QueryTable Browser Query

Table Browser ResultsTable Browser Results

Table Browser OptionsTable Browser Options

Describe Table Schema– schema for SQL table format

Filter– regular expression filter– range control

Intersection??Correlation??Summary Statistics

Table Browser SchemaTable Browser Schema

Table Browser FilterTable Browser Filter

Table Browser Summary Table Browser Summary StatisticsStatistics

V. Gene SorterV. Gene Sorter

Displays a sorted table of genes that are related to one another

Correlation is color-coded– a highly expressed gene is colored red– a less expressed gene is shown in green

Gene Sorter QueryGene Sorter Query

Gene Sorter ResultsGene Sorter Results

Gene Sorter Details #1Gene Sorter Details #1

Gene Sorter Details #2Gene Sorter Details #2

VI. In Silico PCRVI. In Silico PCR

In-Silico PCR searches a sequence database with a pair of PCR primers

Returns: a sequence output file in fasta format containing all sequence in the database that lie between and include the primer pair

PCRPCRPCR: polymerase chain reaction,大量複製特定的 DNA序列

http://members.aol.com/BearFlag45/Biology1A/LectureNotes/lec24.html

In Silico PCR QueryIn Silico PCR Query

Two primer sequence

Max product size Number of match

In Silico PCR ResultsIn Silico PCR Results

Melting temperature

Match in uppercase

Mismatch in lowercase

Forward primer Reverse primer

VII. Protein BrowserVII. Protein Browser

UCSC Proteome Browser Gateway provides a wealth of protein information presented

in the form of graphical images and links to external internet sites– SwissProt information– Proteome browser tracks– Protein property histograms– UCSC links / Domain information– Comparative 3D structures– Pathways / Fasta format

Protein Browser QueryProtein Browser Query

Swiss-Prot/TrEMBL protein ID

Protein Browser TracksProtein Browser Tracks

polarity hydrophobicity

cysteines glycosylation

Protein Browser HistogramsProtein Browser Histograms

Protein Browser 3D structuresProtein Browser 3D structures

VIII. UtilitiesVIII. Utilities

Some tools (for preparing input)– Batch Coordinate Conversion (liftOver)

converts genome coordinates and genome annotation files between assemblies

WHY?– occasionally, a chunk of sequence may be moved to an entirely d

ifferent chromosome as the map is refined

– DNA Duster formatting tool

– Protein Duster formatting tool

IX. DownloadsIX. Downloads

Offers downloads to complete genomes– Human– Chimpanzee– Rhesus– Dog– Cow– Mouse– Rat– Opossum– Chicken

top related