1/60 an iterative relaxation technique for the nmr backbone assignment problem wen-lian hsu...
Post on 19-Jan-2016
232 Views
Preview:
TRANSCRIPT
11/60/60
An Iterative Relaxation An Iterative Relaxation Technique for the NMR Technique for the NMR
Backbone Assignment ProblemBackbone Assignment Problem
Wen-Lian HsuWen-Lian HsuInstitute of Information ScienceInstitute of Information Science
Academia SinicaAcademia Sinica
22
Characteristics of Our MethodCharacteristics of Our Method
Model this as a constraint satisfaction Model this as a constraint satisfaction problemproblem
Solve it using natural language parsing Solve it using natural language parsing techniquestechniques Both top-down and bottom-upBoth top-down and bottom-up
An iterative approach An iterative approach Create spin systems based on noisy data.Create spin systems based on noisy data. Link spin systems by using maximum Link spin systems by using maximum
independent set finding techniques.independent set finding techniques.
33
OutlineOutline
IntroductionIntroduction MethodMethod Experiment ResultsExperiment Results ConclusionConclusion
44
Blind Man’s ElephantBlind Man’s Elephant
We cannot directly “see” the positions of We cannot directly “see” the positions of these atoms (the structure)these atoms (the structure)
But we can measure a set of parameters But we can measure a set of parameters (with constraints) on these atoms(with constraints) on these atoms Which can help us infer their coordinatesWhich can help us infer their coordinates
Each experiment can only determine a subset of parameters (with noises)
To combine the parameters of different experiments we need to stitch them together
55
The Flow of NMR ExperimentsThe Flow of NMR Experiments
Structure ConstraintsResonance assignment
Get protein Samples
Calculation and simulation- Energy minimization- Fitness of structure constraints
Collect NMR spectra
66
Find out Chemical Shift for Each Atom• Backbone atoms: Ca, Cb, C’, N, NH
•Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA
• Side chain: all others (especially CHs)
TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY
C CN
H H
C
C
C
H2
H2
H3
Chemical Shift AssignmentChemical Shift Assignment
One amino acid
77
H-C-H
H-CC-H
H
-N-C-C-N-C-C-N-C-C-N-C-C-
O
O
O
O
H H
H
H
H O
H
H-C-H
CH3
Backbone
Some Relevant ParametersSome Relevant Parameters
ppm18-23
19-24
16-20
17-23
31-34
55-60
CH3 30-35
88
• Backbone: Ca, Cb, C’, N, NH
HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA
sequential assignment
chemical shifts of Ca, Cb, NH
HSQC
Three important experimentsThree important experiments
Our NMR spectra
CBCANH CBCA(CO)NH
HSQC CBCA(CO)NH (2 peaks) HNCACB (4 peaks)
1010
HSQC SpectraHSQC Spectra
HSQC peaks (1 chemical shifts for an amino acid)HSQC peaks (1 chemical shifts for an amino acid)
HH NN IntensityIntensity
8.1098.109 118.60118.60 6592003265920032
HSQC
1111
CBCA(CO)NH SpectraCBCA(CO)NH Spectra
CBCA(CO)NH peaks (2 chemical shifts for one amino acid) CBCA(CO)NH peaks (2 chemical shifts for one amino acid)
HH NN CC IntensityIntensity
8.1168.116 118.25118.25 16.3716.37 7923881179238811
8.1098.109 118.60118.60 36.5236.52 6592003265920032
1212
CBCANH SpectraCBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid)CBCANH peaks (4 chemical shifts for one amino acid)
Ca (+), Cb (-)Ca (+), Cb (-)
HH NN CC Intensity Intensity
8.1168.116 118.25118.25 16.3716.37 7923881179238811
8.1098.109 118.60118.60 36.5236.52 ──6592003265920032
8.1178.117 118.90118.90 61.5861.58 ──5122389451223894
8.1198.119 117.25117.25 57.4257.42 109928374109928374
++
--
1313
A Dataset ExampleA Dataset Example
HSQC
HNCACB 4
CBCA(CO)NH 2
N
H
1414
Backbone AssignmentBackbone Assignment
GoalGoal Assign chemical shifts to N, NH, Ca (and Cb) Assign chemical shifts to N, NH, Ca (and Cb)
along the protein backbone.along the protein backbone. General approachesGeneral approaches
Generate spin systemsGenerate spin systems• A spin systemA spin system: an amino acid with known chemical : an amino acid with known chemical
shifts on its N, NH, Ca (and Cb).shifts on its N, NH, Ca (and Cb). Link spin systemsLink spin systems
1515
AmbiguitiesAmbiguities
All 4 point experiments are mixed togetherAll 4 point experiments are mixed together All 2 point experiments are mixed togetherAll 2 point experiments are mixed together Each spin system can be mapped to Each spin system can be mapped to
several amino acids in the protein several amino acids in the protein sequencesequence
False positives, false negativesFalse positives, false negatives
1616
Previous ApproachesPrevious Approaches Constrained bipartite matching problemConstrained bipartite matching problem
The spin system might be ambiguousThe spin system might be ambiguous Can’t deal with ambiguous link Can’t deal with ambiguous link
Legal matching Illegal matching under constraints
1717
Natural Language ProcessingNatural Language Processing ─ ─ Signal or Noise?Signal or Noise?
Speech recognitionSpeech recognition :: Homophone selectionHomophone selection
台 北 市 一 位 小 孩 走 失 了
台 北 市 小 孩台 北 適 宜 走 失 事 宜 一 位 一 味 移 位
1818
An Error-Tolerant Algorithm
1919
Phrase, Sentence Combination
2020
句意模版
句型模版
片語模版
字詞模版
Hierarchical Analysis
Perfect Group Each spin group contains 6 points, in which
4 points are from the first experiments 2 points are from the second experiment
H O H
N
H
C C
C
C
C
Perfect Group Each spin group contains 6 points, in which
4 points are from the first experiments 2 points are from the second experiment
H O H
N
H
C C
C
C
C
H O H
N
H
C C
C
C
C
2323
NN HH CC IntensityIntensity
113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008
113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008
CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
56.294
28.165 62.544
68.483NN HH CC IntensityIntensity
113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007
113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007
113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007
113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007
CBCA(CO)NH
CBCANH
i -1
i -1
Ca
Ca
Cb
Cb
A Perfect Spin System GroupA Perfect Spin System Group
2424
False Positives and False NegativesFalse Positives and False Negatives
False positivesFalse positives Noise with high intensityNoise with high intensity Produce fake spin systemsProduce fake spin systems
False negativesFalse negatives Peaks with low intensityPeaks with low intensity Missing peaksMissing peaks
In real wet-lab data, nearly 50% are noises In real wet-lab data, nearly 50% are noises (false positive).(false positive).
2525
Spin System GroupSpin System Group
Perfect
False Negative
False Positive
N
H
2626
OutlineOutline
IntroductionIntroduction
MethodMethod Experiment ResultsExperiment Results ConclusionConclusion
2727
Main IdeaMain Idea
Deal with false negative in spin system Deal with false negative in spin system generation procedures.generation procedures.
Eliminate false positive in spin system Eliminate false positive in spin system linking procedures.linking procedures.
Perform spin system generation and Perform spin system generation and linking procedures in an iterative fashion.linking procedures in an iterative fashion.
2828
Spin System Group GenerationSpin System Group Generation
Three types of spin system group are Three types of spin system group are generated based on the quality of generated based on the quality of CBCANH data:CBCANH data: PerfectPerfect Weak false negativeWeak false negative Severe false negativeSevere false negative
2929
Perfect Spin SystemsPerfect Spin Systems
A spin system is determined without any added A spin system is determined without any added pseudo peak. pseudo peak.
NN HH CC IntensityIntensity
113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008
113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008
CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
56.294
28.165 62.544
68.483NN HH CC IntensityIntensity
113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007
113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007
113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007
113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007
CBCA(CO)NH
CBCANH
i -1
i -1
Ca
Ca
Cb
Cb
3030
Weak False Negative Spin System GroupWeak False Negative Spin System Group
NN HH CC IntensityIntensity
115.481115.481 9.6049.604 60.04460.044 1.30407e+0081.30407e+008
115.481115.481 9.6049.604 30.6630.66 6.93923e+0076.93923e+007
CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
60.044 31.291 59.419 27.583
A spin system is determined with an added A spin system is determined with an added pseudo peak.pseudo peak.
NN HH CC IntensityIntensity
115.481115.481 9.6169.616 59.41959.419 2.25295e+0082.25295e+008
115.481115.481 9.6169.616 31.29131.291 -4.82097e+007-4.82097e+007
115.481115.481 9.6169.616 27.85327.853 -1.33326e+008-1.33326e+008
CBCA(CO)NH
CBCANH
i -1
i -1
Ca
Cb
Cb
Ca115.481 9.604 60.044 1.30407e+008115.481 9.604 60.044 1.30407e+008
3131
Severe false Negative Spin System GroupSevere false Negative Spin System Group
NN HH CC IntensityIntensity
119.857119.857 8.4358.435 28.16628.166 3.36293e+0073.36293e+007
119.857119.857 8.4358.435 59.41959.419 1.56434e+0081.56434e+008CCaa
i-1i-1 CCbbi-1i-1 CCaa
ii CCbbii
59.419 28.166 58.481 28.79
A spin system is determined with two added A spin system is determined with two added pseudo peaks.pseudo peaks.
NN HH CC IntensityIntensity
119.856119.856 8.4778.477 58.48158.481 3.7353e+0083.7353e+008
119.856119.856 8.4778.477 28.7928.79 -2.55735e+008-2.55735e+008
CBCA(CO)NH
CBCANH
119.857 8.435 28.166 3.36293e+007119.857 8.435 28.166 3.36293e+007119.857 8.435 59.419 1.56434e+008119.857 8.435 59.419 1.56434e+008
i -1
i -1
Ca
Cb
Cb
Ca
Note: it is also possible thatCa
i-1 = 28.166 and Cbi-1 = 59.419
3232
A note on spin system generationA note on spin system generation
To generate *ALL* possible spin systems, a To generate *ALL* possible spin systems, a peak can be included in more than one spin peak can be included in more than one spin system.system. False positives are eliminated in spin system linking False positives are eliminated in spin system linking
procedure.procedure. False negative are treated by adding pseudo peaks.False negative are treated by adding pseudo peaks.
A rule-based mechanism is used to filter out A rule-based mechanism is used to filter out incompatible spin systems (false positives).incompatible spin systems (false positives). Adopt maximum weight independent set algorithmAdopt maximum weight independent set algorithm
3333
Spin System LinkingSpin System Linking
GoalGoal Link spin system as long as possible.Link spin system as long as possible.
Constraints Constraints Each spin system is uniquely assigned to a Each spin system is uniquely assigned to a
position of the target protein sequence.position of the target protein sequence. Two spin systems are linked only if the Two spin systems are linked only if the
chemical shift differences of their intra- and chemical shift differences of their intra- and inter- residues are less than the predefined inter- residues are less than the predefined thresholds.thresholds.
A Peculiar Parking Lot (valet parking)A Peculiar Parking Lot (valet parking)Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).
Backbone AssignmentBackbone Assignment
DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLS
3636
Spin System PositioningSpin System Positioning
55.266 38.675 44.555 0
44.417 0 55.043 30.04
44.417 0 30.665 28.72
55356 29.782 60.044 37.541
D 50 G 10 R 40 I 50|51
55.266 38.675 44.555 0 => 50 10
44.417 0 55.043 30.04 =>10 40
44.417 0 30.665 28.72 =>10 40
55356 29.782 60.044 37.541 => 40 50
We assign spin system groups to a protein We assign spin system groups to a protein sequence according to their codes. sequence according to their codes.
Spin System
3737
Segment 3
Segment 2
Segment 1
Link Spin System groupsLink Spin System groups
55.266 38.675 44.555 0
44.417 0 55.043 30.04
44.417 0 30.665 28.72
55356 29.782 60.044 37.541
D G R I
3838
Iterative Concatenation Iterative Concatenation DGRI….FKJJREKLDGRI….FKJJREKL
….
Step n Segment 99
1
2
….
56
Spin Systems
1
2
47
1Step156…
Step2 Segment 1
Segment 2
Segment 31…
Step n-1 Segment 78 Segment 79…
3939
Conflict SegmentsConflict Segments
DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLSSegment 78
Segment 71
Segment 79
Segment 99 Segment 98
Segment 97
Two kinds of conflict segments
Overlap (e.g. segment 71, segment 99)
Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )
4040
A Graph Model for Spin System LinkingA Graph Model for Spin System Linking
GG((VV,,EE)) VV: a set of nodes (segments).: a set of nodes (segments). EE:: ((uu, , vv), ), uu, , vv VV, , uu and and v v are conflict.are conflict.
GoalGoal Assign as many non-conflict segments Assign as many non-conflict segments
as possible => find the maximum as possible => find the maximum independent set of independent set of GG..
4141
An Example of An Example of GG
Seq. : Seq. : GEIKGRKTLATPAVRRLAMENNIKLSEGEIKGRKTLATPAVRRLAMENNIKLSE
Segment1: SP12->SP13->SP14
Segment2: SP9->SP13->SP20->SP4
Segment3: SP8->SP15->SP21
Segment4: SP7->SP1->SP15->SP3
Seg1 Seg3
Seg4 Seg2
Seg1
Seg3
Seg2
Seg4
SP13
SP15
Overlap
Overlap
4242
Segment weightSegment weight
The larger length of segment is, the higher The larger length of segment is, the higher weight of segment is.weight of segment is.
The less frequency of segment is, the The less frequency of segment is, the higher of segment is.higher of segment is.
4343
Find Maximum Weight Independent Set Find Maximum Weight Independent Set of of GG
Boppana, R. and M.M. HalldBoppana, R. and M.M. Halldόόrsson, rsson, Approximatin Maximum Independent SetApproximatin Maximum Independent Sets bt Excluding Subgraphs.s bt Excluding Subgraphs. BIR, 1992. BIR, 1992. 3232(2).(2).
4444
An Iterative ApproachAn Iterative Approach
We perform spin system generation and We perform spin system generation and linking iteratively.linking iteratively.
Three stages.Three stages.
4545
First StageFirst Stage
Generate perfect spin systems;Generate perfect spin systems; Perform spin system concatenation on spin systems (nPerform spin system concatenation on spin systems (n
ewly generated perfect) to generate segments;ewly generated perfect) to generate segments; Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments;Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in Drop spin systems (and related peaks) that are used in
the resulting segments.the resulting segments.
4646
Second StageSecond Stage
Generate weak false negative spin systemsGenerate weak false negative spin systems.. Perform segment extension on the resulting segments Perform segment extension on the resulting segments
of the first iteration (using unused perfect and newly geof the first iteration (using unused perfect and newly generated weak false negative);nerated weak false negative);
Perform spin system concatenation on the unused spin Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate losystems (perfect + weak false negative) to generate longer segments;nger segments;
Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments;Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in Drop spin systems (and related peaks) that are used in
the resulting segments.the resulting segments.
4747
Third StageThird Stage
Generate severe false negative spin systems.Generate severe false negative spin systems. Perform segment extension on the resulting segments Perform segment extension on the resulting segments
of the second iteration (using unused perfect and weak of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false false negative, as well as newly generated severe false negative);negative);
Perform spin system concatenation on the unused spin Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false systems (perfect + weak false negative + severe false negative) to generate longer segments;negative) to generate longer segments;
Retain segments that contain at least 3 spin systems;Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments.Perform MaxIndSet on the segments.
4848
…….FKJJREKL…..FKJJREKL….
Segment ExtensionSegment Extension
1091
2
….
45
12
29
109
29
New 109
New spin systems
4949
Segment ExtensionSegment Extension DGRDGRGEKGRKTLATPAVRRLAMENNIKLSGEKGRKTLATPAVRRLAMENNIKLS
MaxIndSetMaxIndSet
77 99‘ 97‘
99 97
45
23
263129
3233
24
2728
28
77
71
78
99‘
97‘
99 97
5050
OutlineOutline
IntroductionIntroduction MethodMethod
Experimental ResultsExperimental Results ConclusionConclusion
5151
Experimental ResultsExperimental Results
Two datasets obtained from our collaborator Dr. Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica:Tai-Huang, Huang in IBMS, Academia Sinica: Average precision: Average precision: 87.5%87.5% Average recall: Average recall: 73.1%73.1%
Perfect data from BMRB: Perfect data from BMRB: 99.1%99.1%
5252
Real Wet-Lab DatasetsReal Wet-Lab Datasets The two datasets are The two datasets are
obtained from our obtained from our collaborator Dr. Tai-collaborator Dr. Tai-Huang, Huang in Huang, Huang in IBMS at Academia IBMS at Academia Sinica, Taiwan.Sinica, Taiwan.
Datasets sbd lbd# of amino acids 53 85
# of amino acids that are assigned manually by biologists 42 80
# of HSQC peaks 58 78
# of CBCA(CO)NH peaks 258 271
# of HNCACB peaks 224 620
# of expected CBCA(CO)NH 84 160
# of expected HNCACB 168 320
false positive of CBCA(CO)NH 67.4%
41.0%
false positive of HNCACB 25.0%48.4
%
5353
Experimental Results on Real DataExperimental Results on Real Data
datasetsdatasets sbdsbd lbdlbd
# of amino acid# of amino acid 5353 8585
# of assigned amino acid# of assigned amino acid 4242 8181
# of HSQC# of HSQC 5858 7878
# of CBCANH peaks# of CBCANH peaks 224224 620620
# of CBCA(CO)NH peaks# of CBCA(CO)NH peaks 258258 271271
# of correctly assigned # of assigned accuracy recall
Method on sbd 32 35 91.4% 76.2%
Method on lbd 56 67 83.6% 70.0%
5454
OutlineOutline
IntroductionIntroduction MethodMethod Experiment ResultsExperiment Results
ConclusionConclusion
5555
ConclusionConclusion
We model the backbone assignment We model the backbone assignment problem as a constraint satisfaction problem as a constraint satisfaction problemproblem
This problem is solved using a natural This problem is solved using a natural language parsing technique (both bottom-language parsing technique (both bottom-up and top-down approach)up and top-down approach)
The same approach seem to work for a The same approach seem to work for a large class of noise reduction problems large class of noise reduction problems that are discrete in nature that are discrete in nature
top related