1/60 an iterative relaxation technique for the nmr backbone assignment problem wen-lian hsu...

Post on 19-Jan-2016

232 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

11/60/60

An Iterative Relaxation An Iterative Relaxation Technique for the NMR Technique for the NMR

Backbone Assignment ProblemBackbone Assignment Problem

Wen-Lian HsuWen-Lian HsuInstitute of Information ScienceInstitute of Information Science

Academia SinicaAcademia Sinica

22

Characteristics of Our MethodCharacteristics of Our Method

Model this as a constraint satisfaction Model this as a constraint satisfaction problemproblem

Solve it using natural language parsing Solve it using natural language parsing techniquestechniques Both top-down and bottom-upBoth top-down and bottom-up

An iterative approach An iterative approach Create spin systems based on noisy data.Create spin systems based on noisy data. Link spin systems by using maximum Link spin systems by using maximum

independent set finding techniques.independent set finding techniques.

33

OutlineOutline

IntroductionIntroduction MethodMethod Experiment ResultsExperiment Results ConclusionConclusion

44

Blind Man’s ElephantBlind Man’s Elephant

We cannot directly “see” the positions of We cannot directly “see” the positions of these atoms (the structure)these atoms (the structure)

But we can measure a set of parameters But we can measure a set of parameters (with constraints) on these atoms(with constraints) on these atoms Which can help us infer their coordinatesWhich can help us infer their coordinates

Each experiment can only determine a subset of parameters (with noises)

To combine the parameters of different experiments we need to stitch them together

55

The Flow of NMR ExperimentsThe Flow of NMR Experiments

Structure ConstraintsResonance assignment

Get protein Samples

Calculation and simulation- Energy minimization- Fitness of structure constraints

Collect NMR spectra

66

Find out Chemical Shift for Each Atom• Backbone atoms: Ca, Cb, C’, N, NH

•Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

• Side chain: all others (especially CHs)

TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY

C CN

H H

C

C

C

H2

H2

H3

Chemical Shift AssignmentChemical Shift Assignment

One amino acid

77

H-C-H

H-CC-H

H

-N-C-C-N-C-C-N-C-C-N-C-C-

O

O

O

O

H H

H

H

H O

H

H-C-H

CH3

Backbone

Some Relevant ParametersSome Relevant Parameters

ppm18-23

19-24

16-20

17-23

31-34

55-60

CH3 30-35

88

• Backbone: Ca, Cb, C’, N, NH

HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA

sequential assignment

chemical shifts of Ca, Cb, NH

HSQC

Three important experimentsThree important experiments

Our NMR spectra

CBCANH CBCA(CO)NH

HSQC CBCA(CO)NH (2 peaks) HNCACB (4 peaks)

1010

HSQC SpectraHSQC Spectra

HSQC peaks (1 chemical shifts for an amino acid)HSQC peaks (1 chemical shifts for an amino acid)

HH NN IntensityIntensity

8.1098.109 118.60118.60 6592003265920032

HSQC

1111

CBCA(CO)NH SpectraCBCA(CO)NH Spectra

CBCA(CO)NH peaks (2 chemical shifts for one amino acid) CBCA(CO)NH peaks (2 chemical shifts for one amino acid)

HH NN CC IntensityIntensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 6592003265920032

1212

CBCANH SpectraCBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid)CBCANH peaks (4 chemical shifts for one amino acid)

Ca (+), Cb (-)Ca (+), Cb (-)

HH NN CC Intensity Intensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 ──6592003265920032

8.1178.117 118.90118.90 61.5861.58 ──5122389451223894

8.1198.119 117.25117.25 57.4257.42 109928374109928374

++

--

1313

A Dataset ExampleA Dataset Example

HSQC

HNCACB 4

CBCA(CO)NH 2

N

H

1414

Backbone AssignmentBackbone Assignment

GoalGoal Assign chemical shifts to N, NH, Ca (and Cb) Assign chemical shifts to N, NH, Ca (and Cb)

along the protein backbone.along the protein backbone. General approachesGeneral approaches

Generate spin systemsGenerate spin systems• A spin systemA spin system: an amino acid with known chemical : an amino acid with known chemical

shifts on its N, NH, Ca (and Cb).shifts on its N, NH, Ca (and Cb). Link spin systemsLink spin systems

1515

AmbiguitiesAmbiguities

All 4 point experiments are mixed togetherAll 4 point experiments are mixed together All 2 point experiments are mixed togetherAll 2 point experiments are mixed together Each spin system can be mapped to Each spin system can be mapped to

several amino acids in the protein several amino acids in the protein sequencesequence

False positives, false negativesFalse positives, false negatives

1616

Previous ApproachesPrevious Approaches Constrained bipartite matching problemConstrained bipartite matching problem

The spin system might be ambiguousThe spin system might be ambiguous Can’t deal with ambiguous link Can’t deal with ambiguous link

Legal matching Illegal matching under constraints

1717

Natural Language ProcessingNatural Language Processing ─ ─ Signal or Noise?Signal or Noise?

Speech recognitionSpeech recognition :: Homophone selectionHomophone selection

台 北 市 一 位 小 孩 走 失 了

台 北 市 小 孩台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

1818

An Error-Tolerant Algorithm

1919

Phrase, Sentence Combination

2020

句意模版

句型模版

片語模版

字詞模版

Hierarchical Analysis

Perfect Group Each spin group contains 6 points, in which

4 points are from the first experiments 2 points are from the second experiment

H O H

N

H

C C

C

C

C

Perfect Group Each spin group contains 6 points, in which

4 points are from the first experiments 2 points are from the second experiment

H O H

N

H

C C

C

C

C

H O H

N

H

C C

C

C

C

2323

NN HH CC IntensityIntensity

113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008

113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008

CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

56.294

28.165 62.544

68.483NN HH CC IntensityIntensity

113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007

113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007

113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007

113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007

CBCA(CO)NH

CBCANH

i -1

i -1

Ca

Ca

Cb

Cb

A Perfect Spin System GroupA Perfect Spin System Group

2424

False Positives and False NegativesFalse Positives and False Negatives

False positivesFalse positives Noise with high intensityNoise with high intensity Produce fake spin systemsProduce fake spin systems

False negativesFalse negatives Peaks with low intensityPeaks with low intensity Missing peaksMissing peaks

In real wet-lab data, nearly 50% are noises In real wet-lab data, nearly 50% are noises (false positive).(false positive).

2525

Spin System GroupSpin System Group

Perfect

False Negative

False Positive

N

H

2626

OutlineOutline

IntroductionIntroduction

MethodMethod Experiment ResultsExperiment Results ConclusionConclusion

2727

Main IdeaMain Idea

Deal with false negative in spin system Deal with false negative in spin system generation procedures.generation procedures.

Eliminate false positive in spin system Eliminate false positive in spin system linking procedures.linking procedures.

Perform spin system generation and Perform spin system generation and linking procedures in an iterative fashion.linking procedures in an iterative fashion.

2828

Spin System Group GenerationSpin System Group Generation

Three types of spin system group are Three types of spin system group are generated based on the quality of generated based on the quality of CBCANH data:CBCANH data: PerfectPerfect Weak false negativeWeak false negative Severe false negativeSevere false negative

2929

Perfect Spin SystemsPerfect Spin Systems

A spin system is determined without any added A spin system is determined without any added pseudo peak. pseudo peak.

NN HH CC IntensityIntensity

113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008

113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008

CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

56.294

28.165 62.544

68.483NN HH CC IntensityIntensity

113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007

113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007

113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007

113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007

CBCA(CO)NH

CBCANH

i -1

i -1

Ca

Ca

Cb

Cb

3030

Weak False Negative Spin System GroupWeak False Negative Spin System Group

NN HH CC IntensityIntensity

115.481115.481 9.6049.604 60.04460.044 1.30407e+0081.30407e+008

115.481115.481 9.6049.604 30.6630.66 6.93923e+0076.93923e+007

CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

60.044 31.291 59.419 27.583

A spin system is determined with an added A spin system is determined with an added pseudo peak.pseudo peak.

NN HH CC IntensityIntensity

115.481115.481 9.6169.616 59.41959.419 2.25295e+0082.25295e+008

115.481115.481 9.6169.616 31.29131.291 -4.82097e+007-4.82097e+007

115.481115.481 9.6169.616 27.85327.853 -1.33326e+008-1.33326e+008

CBCA(CO)NH

CBCANH

i -1

i -1

Ca

Cb

Cb

Ca115.481 9.604 60.044 1.30407e+008115.481 9.604 60.044 1.30407e+008

3131

Severe false Negative Spin System GroupSevere false Negative Spin System Group

NN HH CC IntensityIntensity

119.857119.857 8.4358.435 28.16628.166 3.36293e+0073.36293e+007

119.857119.857 8.4358.435 59.41959.419 1.56434e+0081.56434e+008CCaa

i-1i-1 CCbbi-1i-1 CCaa

ii CCbbii

59.419 28.166 58.481 28.79

A spin system is determined with two added A spin system is determined with two added pseudo peaks.pseudo peaks.

NN HH CC IntensityIntensity

119.856119.856 8.4778.477 58.48158.481 3.7353e+0083.7353e+008

119.856119.856 8.4778.477 28.7928.79 -2.55735e+008-2.55735e+008

CBCA(CO)NH

CBCANH

119.857 8.435 28.166 3.36293e+007119.857 8.435 28.166 3.36293e+007119.857 8.435 59.419 1.56434e+008119.857 8.435 59.419 1.56434e+008

i -1

i -1

Ca

Cb

Cb

Ca

Note: it is also possible thatCa

i-1 = 28.166 and Cbi-1 = 59.419

3232

A note on spin system generationA note on spin system generation

To generate *ALL* possible spin systems, a To generate *ALL* possible spin systems, a peak can be included in more than one spin peak can be included in more than one spin system.system. False positives are eliminated in spin system linking False positives are eliminated in spin system linking

procedure.procedure. False negative are treated by adding pseudo peaks.False negative are treated by adding pseudo peaks.

A rule-based mechanism is used to filter out A rule-based mechanism is used to filter out incompatible spin systems (false positives).incompatible spin systems (false positives). Adopt maximum weight independent set algorithmAdopt maximum weight independent set algorithm

3333

Spin System LinkingSpin System Linking

GoalGoal Link spin system as long as possible.Link spin system as long as possible.

Constraints Constraints Each spin system is uniquely assigned to a Each spin system is uniquely assigned to a

position of the target protein sequence.position of the target protein sequence. Two spin systems are linked only if the Two spin systems are linked only if the

chemical shift differences of their intra- and chemical shift differences of their intra- and inter- residues are less than the predefined inter- residues are less than the predefined thresholds.thresholds.

A Peculiar Parking Lot (valet parking)A Peculiar Parking Lot (valet parking)Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).

Backbone AssignmentBackbone Assignment

DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLS

3636

Spin System PositioningSpin System Positioning

55.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D 50 G 10 R 40 I 50|51

55.266 38.675 44.555 0 => 50 10

44.417 0 55.043 30.04 =>10 40

44.417 0 30.665 28.72 =>10 40

55356 29.782 60.044 37.541 => 40 50

We assign spin system groups to a protein We assign spin system groups to a protein sequence according to their codes. sequence according to their codes.

Spin System

3737

Segment 3

Segment 2

Segment 1

Link Spin System groupsLink Spin System groups

55.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D G R I

3838

Iterative Concatenation Iterative Concatenation DGRI….FKJJREKLDGRI….FKJJREKL

….

Step n Segment 99

1

2

….

56

Spin Systems

1

2

47

1Step156…

Step2 Segment 1

Segment 2

Segment 31…

Step n-1 Segment 78 Segment 79…

3939

Conflict SegmentsConflict Segments

DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLSSegment 78

Segment 71

Segment 79

Segment 99 Segment 98

Segment 97

Two kinds of conflict segments

Overlap (e.g. segment 71, segment 99)

Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )

4040

A Graph Model for Spin System LinkingA Graph Model for Spin System Linking

GG((VV,,EE)) VV: a set of nodes (segments).: a set of nodes (segments). EE:: ((uu, , vv), ), uu, , vv VV, , uu and and v v are conflict.are conflict.

GoalGoal Assign as many non-conflict segments Assign as many non-conflict segments

as possible => find the maximum as possible => find the maximum independent set of independent set of GG..

4141

An Example of An Example of GG

Seq. : Seq. : GEIKGRKTLATPAVRRLAMENNIKLSEGEIKGRKTLATPAVRRLAMENNIKLSE

Segment1: SP12->SP13->SP14

Segment2: SP9->SP13->SP20->SP4

Segment3: SP8->SP15->SP21

Segment4: SP7->SP1->SP15->SP3

Seg1 Seg3

Seg4 Seg2

Seg1

Seg3

Seg2

Seg4

SP13

SP15

Overlap

Overlap

4242

Segment weightSegment weight

The larger length of segment is, the higher The larger length of segment is, the higher weight of segment is.weight of segment is.

The less frequency of segment is, the The less frequency of segment is, the higher of segment is.higher of segment is.

4343

Find Maximum Weight Independent Set Find Maximum Weight Independent Set of of GG

Boppana, R. and M.M. HalldBoppana, R. and M.M. Halldόόrsson, rsson, Approximatin Maximum Independent SetApproximatin Maximum Independent Sets bt Excluding Subgraphs.s bt Excluding Subgraphs. BIR, 1992. BIR, 1992. 3232(2).(2).

4444

An Iterative ApproachAn Iterative Approach

We perform spin system generation and We perform spin system generation and linking iteratively.linking iteratively.

Three stages.Three stages.

4545

First StageFirst Stage

Generate perfect spin systems;Generate perfect spin systems; Perform spin system concatenation on spin systems (nPerform spin system concatenation on spin systems (n

ewly generated perfect) to generate segments;ewly generated perfect) to generate segments; Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments;Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in Drop spin systems (and related peaks) that are used in

the resulting segments.the resulting segments.

4646

Second StageSecond Stage

Generate weak false negative spin systemsGenerate weak false negative spin systems.. Perform segment extension on the resulting segments Perform segment extension on the resulting segments

of the first iteration (using unused perfect and newly geof the first iteration (using unused perfect and newly generated weak false negative);nerated weak false negative);

Perform spin system concatenation on the unused spin Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate losystems (perfect + weak false negative) to generate longer segments;nger segments;

Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments;Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in Drop spin systems (and related peaks) that are used in

the resulting segments.the resulting segments.

4747

Third StageThird Stage

Generate severe false negative spin systems.Generate severe false negative spin systems. Perform segment extension on the resulting segments Perform segment extension on the resulting segments

of the second iteration (using unused perfect and weak of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false false negative, as well as newly generated severe false negative);negative);

Perform spin system concatenation on the unused spin Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false systems (perfect + weak false negative + severe false negative) to generate longer segments;negative) to generate longer segments;

Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments.Perform MaxIndSet on the segments.

4848

…….FKJJREKL…..FKJJREKL….

Segment ExtensionSegment Extension

1091

2

….

45

12

29

109

29

New 109

New spin systems

4949

Segment ExtensionSegment Extension DGRDGRGEKGRKTLATPAVRRLAMENNIKLSGEKGRKTLATPAVRRLAMENNIKLS

MaxIndSetMaxIndSet

77 99‘ 97‘

99 97

45

23

263129

3233

24

2728

28

77

71

78

99‘

97‘

99 97

5050

OutlineOutline

IntroductionIntroduction MethodMethod

Experimental ResultsExperimental Results ConclusionConclusion

5151

Experimental ResultsExperimental Results

Two datasets obtained from our collaborator Dr. Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica:Tai-Huang, Huang in IBMS, Academia Sinica: Average precision: Average precision: 87.5%87.5% Average recall: Average recall: 73.1%73.1%

Perfect data from BMRB: Perfect data from BMRB: 99.1%99.1%

5252

Real Wet-Lab DatasetsReal Wet-Lab Datasets The two datasets are The two datasets are

obtained from our obtained from our collaborator Dr. Tai-collaborator Dr. Tai-Huang, Huang in Huang, Huang in IBMS at Academia IBMS at Academia Sinica, Taiwan.Sinica, Taiwan.

Datasets sbd lbd# of amino acids 53 85

# of amino acids that are assigned manually by biologists 42 80

# of HSQC peaks 58 78

# of CBCA(CO)NH peaks 258 271

# of HNCACB peaks 224 620

# of expected CBCA(CO)NH 84 160

# of expected HNCACB 168 320

false positive of CBCA(CO)NH 67.4%

41.0%

false positive of HNCACB 25.0%48.4

%

5353

Experimental Results on Real DataExperimental Results on Real Data

datasetsdatasets sbdsbd lbdlbd

# of amino acid# of amino acid 5353 8585

# of assigned amino acid# of assigned amino acid 4242 8181

# of HSQC# of HSQC 5858 7878

# of CBCANH peaks# of CBCANH peaks 224224 620620

# of CBCA(CO)NH peaks# of CBCA(CO)NH peaks 258258 271271

# of correctly assigned # of assigned accuracy recall

Method on sbd 32 35 91.4% 76.2%

Method on lbd 56 67 83.6% 70.0%

5454

OutlineOutline

IntroductionIntroduction MethodMethod Experiment ResultsExperiment Results

ConclusionConclusion

5555

ConclusionConclusion

We model the backbone assignment We model the backbone assignment problem as a constraint satisfaction problem as a constraint satisfaction problemproblem

This problem is solved using a natural This problem is solved using a natural language parsing technique (both bottom-language parsing technique (both bottom-up and top-down approach)up and top-down approach)

The same approach seem to work for a The same approach seem to work for a large class of noise reduction problems large class of noise reduction problems that are discrete in nature that are discrete in nature

top related