fish fast identification of segmental homology

21
FISH FISH Fast Identification of Segmental Fast Identification of Segmental Homology Homology University of North Carolina at University of North Carolina at Chapel Hill Chapel Hill Department of Computer Science and Department of Computer Science and Information Engineering, National Taiwan Information Engineering, National Taiwan University University Shian-Gro Wu Shian-Gro Wu

Upload: mandel

Post on 06-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

FISH Fast Identification of Segmental Homology. University of North Carolina at Chapel Hill. Shian-Gro Wu. Department of Computer Science and Information Engineering, National Taiwan University. Outline. Introduction Input data How it works From markers to features Form features to grid - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: FISH Fast Identification of Segmental Homology

FISHFISHFast Identification of Segmental HomologyFast Identification of Segmental Homology

University of North Carolina at Chapel University of North Carolina at Chapel HillHill

Department of Computer Science and Department of Computer Science and Information Engineering, National Taiwan Information Engineering, National Taiwan UniversityUniversity

Shian-Gro WuShian-Gro Wu

Page 2: FISH Fast Identification of Segmental Homology

OutlineOutline• IntroductionIntroduction• Input dataInput data• How it worksHow it works

– From markers to featuresFrom markers to features– Form features to gridForm features to grid– Form grid to bolcksForm grid to bolcks

Page 3: FISH Fast Identification of Segmental Homology

IntroductionIntroduction

• FISH is software for the fast identification and statistical evaluation of segmental homologs.

genome

contig gene(marker)

Page 4: FISH Fast Identification of Segmental Homology

IntroductionIntroduction

contigA

markers

contigB

contigA

features

contigB

contigA

contigB

points

contigA

contigB

blocks

Page 5: FISH Fast Identification of Segmental Homology

Input dataInput data

• Each map file lists the names and transcriptional orientation (if known) of all the markers on one contig.

• Example <map1> gene names transcriptional orientation

At1g01010 1At1g01020 -1At1g01030 -1At1g01040 1At1g01050 -1...

marker

Page 6: FISH Fast Identification of Segmental Homology

Input dataInput data

• Each match file lists all the homologies between markers in a pair of contigs.

• Example <match1v1>

gene names gene names match score At1g01010 At1g02240 94

At1g01010 At1g02250 91At1g01010 At1g32870 66At1g01010 At1g33060 43At1g01010 At1g52880 42....

Page 7: FISH Fast Identification of Segmental Homology

From markers to From markers to featuresfeatures

contigA

markers

contigB

contigA

features

contigB

contigA

contigB

points

contigA

contigB

blocks

Page 8: FISH Fast Identification of Segmental Homology

From markers to From markers to featuresfeatures

• step1step1– positions and transcriptional orientations

(when known) of the markers are read from a set of map files, one map file per contig. Markers within each map file must be ordered according to their physical positions on the contig.

– Individual homologies between markers are read from a set of match files. There is at least one, and no more than two, such files for each pair of contigs.A,B,CA,B,C A&A,A&B,A&C,B&A,B&B………A&A,A&B,A&C,B&A,B&B………

Page 9: FISH Fast Identification of Segmental Homology

From markers to From markers to featuresfeatures

• step2step2– FISH performs detandemization, in

which multiple markers may be collapsed into single features.

– MIN Score and MAX Dist.

markers

features

a b c d e f g h

A B (B) C D (C) E F

Page 10: FISH Fast Identification of Segmental Homology

From markers to featuresFrom markers to features1.ScoreAB > MIN Score

markA markB

ScoreAB

2.ScoreAC > MIN Score and ScoreBC > MIN Score

markA markB

ScoreAB

markA markB

ScoreAC

markC

ScoreBC

markA markB

ScoreABMAX Dist Range

Page 11: FISH Fast Identification of Segmental Homology

Form features to gridForm features to grid

contigA

markers

contigB

contigA

features

contigB

contigA

contigB

points

contigA

contigB

blocks

Page 12: FISH Fast Identification of Segmental Homology

Form features to gridForm features to grid

• In order to identify segmental homologies, FISH computes a grid for each pair of contigs.

• Points in the grid represent matches between pairs of features.

contigA

contigB

fA1 fA2 fA3 fA4

fB1 fB2 fB3 fB4

PointA1B2 PointB2A4

Page 13: FISH Fast Identification of Segmental Homology

Form features to gridForm features to grid

• Each position in the grid, whether or not a point is present, is called as a cell.

cell (contigA,contigB) = feature (contigA) * feature (contigB)

cell (contigC,contigC) = feature (contigC) * [feature (contigC) -1] / 2

A

B

C

C

Page 14: FISH Fast Identification of Segmental Homology

Form features to gridForm features to grid• contig markers features 1 6494 5913 2 4038 3711 3 5221 4777 • contig1 contig2 points cells 1 1 2143 17478828 1 2 2018 21943144 1 3 2088 28246400 2 2 751 6883905

….

Page 15: FISH Fast Identification of Segmental Homology

Form features to Form features to gridgrid

contigA

markers

contigB

contigA

features

contigB

contigA

contigB

points

contigA

contigB

blocks

Page 16: FISH Fast Identification of Segmental Homology

Form grid to bolcksForm grid to bolcks

• Defining the neighborhood size– FISH measures distance between

two points (Xi,Yi) and (Xj,Yj) using the Manhattan distance

– In order to be considered neighbors, two points must be closer than

jiji YYXXd

)1log(

)1log(

4

3

nm

TdT

m:number of points n:number of cells

Page 17: FISH Fast Identification of Segmental Homology

Form grid to bolcksForm grid to bolcks

)1log(

)1log(

4

3

nm

TdT

m:number of pointsn:number of cells

If T=0.05

dT

m/n0.05

0.75

0.0001

23

Page 18: FISH Fast Identification of Segmental Homology

ResultResult

Page 19: FISH Fast Identification of Segmental Homology

Form grid to bolcksForm grid to bolcks

• Choosing among multiple neighborsChoosing among multiple neighbors– It can happen that a point may be in the

neighborhood of more than one other point. – FISH ranks the cells within each

neighborhood and chooses that neighbor having the highest rank

n

c

c

d

dwr

1)/1(

/1

Where n is the number of cells in the point’s neighborhood, dc is the distance of the cell from the point under consideration and w is the weight.

Page 20: FISH Fast Identification of Segmental Homology

Reference

• User’s Manual for Fast Identification of Segmental Homology

http://www.bio.unc.edu/faculty/vision/lab/http://www.bio.unc.edu/faculty/vision/lab/FISH/FISH/

• Fast identification and statistical evaluation of segmental homologies in comparative maps

http://bioinformatics.oxfordjournals.org/cgi/http://bioinformatics.oxfordjournals.org/cgi/content/abstract/19/suppl_1/i74content/abstract/19/suppl_1/i74

Page 21: FISH Fast Identification of Segmental Homology

Thank You