xinbin dai, ph. d. affymetrix probeset mapping and medicago genome annotation (mt4.0 rc1)

20
Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Upload: emil-snow

Post on 23-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Xinbin Dai, Ph. D.

Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0

RC1)

Page 2: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• About Affymetrix Medicago GeneChip

• Mapping Algorithm and Tool

• Bioinformatics Resources for Medicago Truncatula

Agenda

Page 3: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Affymetrix GeneChip Probes

5’ UTR EXON-I EXON-II EXON-III 3’ UTR

mRNA

Probeset: 11 Probes

Target Transcript

25-mer

1 255 10 15 20

1 255 10 15 20

Perfect match - PM

Mismatch - MM

Page 4: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• id_at:Designates probe sets that uniquely recognize target transcripts

• id_a_at:Designates probe sets that recognize alternative transcripts from the

same gene.• id_s_at:

Designates probe sets with common probes among multiple transcripts from different genes.

• id_x_at: Designates probe sets where it was not possible to select either a

unique probe set or a probe set with identical probes among multiple transcripts. Rules for cross-hybridization were dropped in order to design the _x probe sets. These probe sets share some probes identically with two or more sequences and, therefore, these probe sets may cross-hybridize in an unpredictable manner.

GeneChip® Expression Analysis Data Analysis Fundamentals.

Probeset Types

Page 5: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

About Medicago GeneChip

Type Num of probe sets

Percent in the Mtr. set

Notes

Unique probe sets: e.g. Mtr.10097.1.S1_at

44182 86.80 Unique to one gene

Alternative (_a_), e.g.: Mtr.10267.1.S1_a_at

116 2.28 Alternative probe sets to one gene

Shared (_s_), e.g. Mtr.10146.1.S1_s_at

4793 9.42 Common to multiple genes

Others (_x_), e.g.:Mtr.10093.1.S1_x_at

1809 3.55 Other probe sets with complicated mapping

Total 50900 100

Reference sequences: early version of IMGAG, DFCI GeneIndex and alfalfa EST

Page 6: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

• Gene transcripts were matched to corresponding Affymetrix probe sets using a position-weighted scoring index in which mismatches near the middle of a probe were most heavily penalized as follows:

A perfect match for a probe set yields a score of 45

• Matches were declared when at least 8 of 11 probes had scores of 43 or higher.

Cutoff for matching: 43x8=344

Mapping Algorithm and Tool

1 255 10 15 20

[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,2,2,2,2,2,1,1,1,1,1]

Originated from Affymetrix, Inc.

Page 7: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

AffyProbeMapping: An Online Affymetrix Probeset Mapping Tool

http://bioinfo3.noble.org/affymap/

Input sequence:

• Transcript

• cDNA

• EST/Unigene

• CDS

Page 8: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Output of AffyProbeMapping:

AffyProbeMapping also supports Affymetrix chips for other species:

Lotus Japonica, Arabidopsis thaliana, rice, soybean, maize, populus, cotton and tomato

Page 9: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics & Data Resources for Medicago Truncatula

Originated from Affymetrix, Inc.

Data Sources:• Mt3.5v4(2011, version for Nature paper):

optical mapping 44,124 BAC-based gene loci + 18,264 illumina (nr) gene model

• Mt3.5v5(2012, minor changes): 45,859 BAC-based gene loci + 18,264 illumina gene model

• Mt4 RC1(2013, PAG 2013 conference): anchored illumina contigs onto pseudochromosomes. 84,993 gene loci (BAC+illumina). Chr sequences frozen; some of gene models might be removed.

• DFCI Gene index Release 11 294k ESTs/ETs 68,814 Unigenes

Page 10: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on Mt3.5v4 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

37,385 0 59.92

18,354 1 29.42

6,649 >=2 10.66

62,388 Total 100

Page 11: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on Mt4RC1 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

58,660 0 69.02

20,257 1 23.83

6,076 >=2 7.15

84,993 Total 100

Page 12: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Statistics on GeneIndex R11 vs. Probesets Mapping Results using AffyProbeMapping

Num of cDNA Matching probe_set Percent

29,722 0 43.2

32,848 1 47.7

6,244 >=2 9.1

68,814 Total 100

Page 13: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Mapping between Medicago genome vs. AffyMedicago Chip

http://bioinfo3.noble.org/affymap/Dataset.gy

Page 14: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• Sequence Search and Annotation– DOBLAST --- http://bioinfo3.noble.org/doblast/ , a parallel computing

accelerated BLAST search tool

Features:o Preload many Medicago

data resourceo Capable of handling

big dataseto “Tab-delimited bioparser

output format” works friendly with Excel

Page 15: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• Sequence Download and Cut by Coordinates.

– “Sequence Download” page of DOBLAST --- batch download sequences or cut sequences by Coordinates

o Preload many Medicago data resources

o Batch download

o Get a fragment of sequence by coordinates

Page 16: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

DOBLAST sequence download page

Page 17: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

Bioinformatics Tools For Medicago

• LegumeIP: An Integrative Platform to Study Gene Function and Genome Evolution in Legumes.

• Features:– Synteny analysis among model legumes– Phylogenetic analysis for gene family– Gene to gene association analysis– Gbrowser

o http://plantgrn.noble.org/LegumeIP/o We are updating to Version 2

Page 18: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Synteny analysis for Medicago genome

Page 19: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Phylogenetic analysis for Medicago gene family

Page 20: Xinbin Dai, Ph. D. Affymetrix Probeset Mapping and Medicago Genome Annotation (Mt4.0 RC1)

LegumeIP: Gene association network analysis for Medicago gene