jia-ming chang 0508 graph algorithms and their applications to bioinformatics 1/38

38
Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Upload: brittney-roberts

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Jia-Ming Chang 0508Graph Algorithms and Their Applications to Bioinformatics

1/38

Page 2: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Determine Protein Structure X-ray

波長約 1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構

X-ray與結構生物學 利用 X-ray繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。

Nuclear magnetic resonance (NMR)NMR是涉及原子核吸收的過程。因為對某些原子核而言,具有自旋和磁矩的性質。因此,若暴露於強磁場中原子核會吸收電磁輻射,這是由磁場誘導而發生能階分裂的結果。科學家並發現,分子環境會影響在磁場中原子核的無線電波的吸收,利用這種特性來分析分子的結構

AVANCE 800 AV IBMS, Sinica 2/38

Page 3: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

NMR – Nuclear Spin (1/5)

3/38

Page 4: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

NMR – Nuclear Spin (2/5)

4/38

Page 5: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

NMR - Magnetic Field (3/5)

5/38

Page 6: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

NMR – Resonance (4/5)

6/38

Page 7: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

NMR – Chemical Shift (5/5)

7/38

Page 8: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Find out Chemical Shift for Each Atom• Backbone: Ca, Cb, C’, N, NH

HSQC, CBCANH, CBCACONH

C CN

H H

C

C

C

H2

H2

H3

Chemical Shift Assignment (1/2)

One amino acid

8/38

Page 9: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Chemical Shift Assignment (2/2)

H-C-H

H-CC-H

H

-N-C-C-N-C-C-N-C-C-N-C-C-

O

O

O

O

H H

H

H

H O

H

H-C-H

CH3

Backbone

ppm18-23

19-24

16-20

17-23

31-34

55-60

CH3 30-35

9

Page 10: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid)

HH NN IntensityIntensity

8.1098.109 118.60118.60 6592003265920032

HSQC

10

Page 11: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino

acid) HH NN CC IntensityIntensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 6592003265920032

11

Page 12: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid)

Ca (+), Cb (-)

HH NN CC Intensity Intensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 -65920032-65920032

8.1178.117 118.90118.90 61.5861.58 -51223894-51223894

8.1198.119 117.25117.25 57.4257.42 109928374109928374

++

--

12

Page 13: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

A Dataset Example

N

HHSQC

HNCACB

CBCA(CO)NH

13/38

Page 14: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

A Perfect Spin System Group

NN HH CC IntensityIntensity

113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008

113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008

CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

56.294

28.165

62.544 68.483NN HH CC IntensityIntensity

113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007

113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007

113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007

113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007

CBCA(CO)NH

CBCANH

i -1

i -1

Ca

Ca

Cb

Cb

14

Page 15: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Coding

Translate the target protein sequence and spin systems into coding sequences based on the following table.

Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.

15/38

Page 16: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Backbone Assignment

GoalAssign chemical shifts to N, NH, Ca (and

Cb) along the protein backbone.

General approachesGenerate spin systems

○ A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).

Link spin systems

16/38

Page 17: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Ambiguities

All 4 point experiments are mixed together

All 2 point experiments are mixed together

Each spin system can be mapped to several amino acids in the protein sequence

False positives, false negatives

17/38

Page 18: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Ambiguous Spin System

NN HH CC IntensityIntensity

106.9106.9 8.878.87 54.9254.92 423879423879

106.9106.9 8.878.87 40.3540.35 524522524522

NN HH CC IntensityIntensity

106.91106.91 8.858.85 59.759.7 235673235673

106.92106.92 8.868.86 54.9354.93 346234346234

106.91106.91 8.868.86 61.561.5 432432432432

106.91106.91 8.858.85 40.3140.31 -335759-335759

106.92106.92 8.868.86 30.530.5 -483759-483759

NN HH CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

106.1106.1 8.858.85 54.9354.93 40.3140.31 59.759.7 30.530.5

106.1106.1 8.858.85 61.561.5 40.3140.31 59.759.7 30.530.5

Two possible spin systems

18

Page 19: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Multiple Candidates One spin system maybe assign to many places

of a protein sequence. Spin system(SS)

Protein Sequence: AKFERQHMDSSTSRNLTKDR

NN HH CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

119.7119.7 8.848.84 58.458.4 32.732.7 56.356.3 40.840.8

SS SS SS SSPossible place

19

Page 20: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

False Positives and False Negatives False positives

Noise with high intensityProduce fake spin systems

False negativesPeaks with low intensityMissing peaks

In real wet-lab data, nearly 50% are noises (false positive).

20/38

Page 21: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Spin System GroupPerfect

False Negative

False Positive

N

HHSQC

HNCACB

CBCA(CO)NH

21/38

Page 22: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Spin System Linking

GoalLink spin system as long as possible.

Constraints Each spin system is uniquely assigned to a

position of the target protein sequence.Two spin systems are linked only if the

chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

22/38

Page 23: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Previous Approaches Constrained bipartite matching problem*

Can’t deal with ambiguous link Legal matching Illegal matching under constraints

*Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.

23/38

Page 24: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Naatural Language Processing ─ Noises or Ambiguity ?

Speech recognition : Homopone selection

台 北 市 一 位 小 孩 走 失 了

台 北 市 小 孩台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

24/38

Page 25: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

An Error-Tolerant Algorithm

25

Page 26: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Phrase, Sentence Combination

26

Page 27: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Spin System Positioning

55.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D 50 G 10 R 40 I 50|51

55.266 38.675 44.555 0 => 50 10

44.417 0 55.043 30.04 =>10 40

44.417 0 30.665 28.72 =>10 40

55356 29.782 60.044 37.541 => 40 50

We assign spin system groups to a protein We assign spin system groups to a protein sequence according to their codes. sequence according to their codes.

Spin System

27/38

Page 28: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Link Spin System groups

Segment 3

Segment 2

Segment 155.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D G R I

28/38

Page 29: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Iterative Concatenation DGRI….FKJJREKL

….

Step n Segment 99

1

2

….

56

Spin Systems

1

2

2

47

1Step156…

Step2 Segment 1

Segment 2

Segment 31…

Step n-1 Segment 78 Segment 79…

29/38

Page 30: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Conflict Segments

DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLSSegment 78

Segment 71

Segment 79

Segment 99 Segment 98

Segment 97

Two kinds of conflict segments

Overlap (e.g. segment 71, segment 99)

Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1)

30/38

Page 31: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Independent Set

Subset S of vertices such that no two vertices in S are connected

www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 31/38

Page 32: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Independent Set

Subset S of vertices such that no two vertices in S are connected

www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 32/38

Page 33: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

A Graph Model for Spin System Linking

G(V,E) V: a set of nodes (segments). E: (u, v), u, v V, u and v are conflict.

Goal Assign as many non-conflict segments

as possible => find the maximum independent set of G.

33

Page 34: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

An Example of G

Seq. : Seq. : GEIKGRKTLATPAVRRLAMENNIKLSEGEIKGRKTLATPAVRRLAMENNIKLSE

Segment1: SP12->SP13->SP14

Segment2: SP9->SP13->SP20->SP4

Segment3: SP8->SP15->SP21

Segment4: SP7->SP1->SP15->SP3

Seg1 Seg3

Seg4 Seg2

Seg1

Seg3

Seg2

Seg4

SP13

SP15

Overlap

Overlap

34/38

Page 35: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Segment weight

The larger length of segment is, the higher weight of segment is.

The less frequency of segment is, the lower of segment is.

35/38

Page 36: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Find Maximum Weight Independent Set of G (1/2)

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).

VN(v)

Head_N(v)

36

Page 37: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

Find Maximum Weight Independent Set of G (2/2)

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).

V

37

Page 38: Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38

An Iterative Approach

We perform spin system generation and linking iteratively.

Three stages.

38/38