effective reranking for extracting protein-protein interactions from biomedical literature
DESCRIPTION
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature. Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering Nanyang Technological University, Singapore 30 August 2007. Outline. Protein-protein interactions (PPIs) extraction - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/1.jpg)
Effective Reranking for Extracting Protein-protein Interactions from Biomedical
Literature
Deyu Zhou, Yulan He and Chee Keong Kwoh
School of Computer Engineering
Nanyang Technological University, Singapore
30 August 2007
![Page 2: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/2.jpg)
OutlineOutline• Protein-protein interactions (PPIs) extraction
• Hidden Vector State (HVS) model for PPIs extraction
• Reranking approaches
• Experimental results
• Conclusions
![Page 3: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/3.jpg)
ProteinProteinInteract
Protein
Protein-Protein Interactions ExtractionProtein-Protein Interactions Extraction
Spc97p interacts with Spc98 and Tub4 in the two-hybrid system
Spc97p interact Spc98Spc97p interact Tub4
![Page 4: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/4.jpg)
Existing ApproachesExisting Approaches
Statistics Methods
Pattern Matching
Parsing-Based
Simple to Complicated
![Page 5: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/5.jpg)
An exampleAn example
However, unlike another tumor suppressor protein, p53, Rb did not have any significant effecton basal levels of transcription, suggesting that Rb specifically interacts with IE2 rather ...
Part-of-speech tagging
However/RB ,/, unlike/IN another/DT tumor/NN suppressor/NN protein/NN ,/, p53/NN ,/, Rb/NN did/VBD not/RB have/VB any/DT significant/JJ effect/NN on/IN basal/JJ levels/NNS of/INtranscription/NN ,/, suggesting/VBG that/IN Rb/NN specifically/RB interacts/VBZ with/IN IE2/NN rather/RB ...
However/RB ,/, unlike/IN another/DT tumor/NN suppressor/NN protein/NN ,/, PROTEIN(p53/NN) ,/, PROTEIN(Rb/NN) did/VBD not/RB have/VB any/DT significant/JJ effect/NN on/INbasal/JJ levels/NNS of/IN transcription/NN ,/, suggesting/VBG that/IN PROTEIN(Rb/NN)specifically/RB interacts/VBZ with/IN PROTEIN(IE2/NN) rather/RB ...
Protein name identification
![Page 6: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/6.jpg)
Statistics-Based ApproachesStatistics-Based Approaches
Corpus level statisticSentence level statistic
(p53, IE2)(Rb, IE2)
+1+1
Relation Occurrence
(p53, Rb) +1(p53, IE2)
...81
Relation Occurrence
... 6
Relation Confidence
(p53, IE2)...
75%...
... ...
Predefined threshold a = 7
![Page 7: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/7.jpg)
Pattern Matching ApproachesPattern Matching Approaches
Rb interact IE2p53 interact IE2
Protein [*] interact[s] with protein protein RB VBZ WITH protein
Rb interact IE2
Pattern matching
Pattern 1 Pattern 2
![Page 8: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/8.jpg)
Parsing-Based ApproachesParsing-Based Approaches
Syntactic processing
Semantic processing...Rb specifically interacts with IE2...
N ADV V P N
NP PP
VP
VP
(<INTERACT><THE Rb PROTEIN><THE IE2 PROTEIN>)
Rb interact IE2
…...
![Page 9: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/9.jpg)
Semantic ParserSemantic Parser
Ĉ = argmax { P(C|Wn) } = argmax { P(C) P(Wn|C) } c c
For each candidate word string Wn, need to compute most likely set of embedded concepts
semanticmodel
lexicalmodel
![Page 10: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/10.jpg)
We could use a simple finite state tagger …
P(Wn|C)
P(C)
… can be robustly trained using EM, but model is too weak to represent embeddings in natural language
<s> Spc97p interacts with Spc98 and Tub4 in the </s>
SS PROTEIN INTERACT DUMMY SEPROTEIN PROTEINDUMMY DUMMY
two-hybrid system
![Page 11: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/11.jpg)
Perhaps use some form of hierarchical HMM in which each state is a terminal or a nested HMM …
… but when using EM, models rarely converge on good solutions and, in practice, direct maximum-likelihood from “tree-bank” data are needed to train models
P(Wn|C)
P(C)
Spc97p interacts with Spc98 and Tub4 in the two-hybrid system
S
PROTEIN
INTERACT
PREP PROTEIN PROTEINAND DUMMY
INTERACTION
SUBJECT OBJECT OBJECT
![Page 12: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/12.jpg)
Hidden Vector State ModelHidden Vector State Model
<s> Spc97p interacts with Spc98 and Tub4 in the two-hybrid system </s>
SS
PROTEIN
INTERACT
DUMMY SEPROTEIN PROTEINDUMMY DUMMY
PROTEININTERACTPROTEIN
SS
SS PROTEINSS
INTERACTPROTEIN
SS
DUMMYINTERACTPROTEIN
SS
PROTEININTERACTPROTEIN
SS
DUMMYINTERACTPROTEIN
SS
DUMMYSS
SESS
![Page 13: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/13.jpg)
The HVS model is an HMM in which the states correspond to the stack of a push-down automata with a bounded stack size …
P(Wn|C)
… this is a very convenient framework for applying constraints
P(C) PROTEININTERACTPROTEIN
SS
SS PROTEINSS
INTERACTPROTEIN
SS
DUMMYINTERACTPROTEIN
SS
PROTEININTERACTPROTEIN
SS
DUMMYINTERACTPROTEIN
SS
DUMMYSS
SESS
<s> Spc97p interacts with Spc98 and Tub4 in the two </s> -hybrid system
![Page 14: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/14.jpg)
HVS model transition constraints:
• finite stack depth – D• push only one non-terminal semantic onto the stack at each step
… model defined by three simple probability tables
Ĉ = argmax { ∏P(nt|Ct-1) P(Ct[1]|Ct [2..Dt]) P(Wt|Ct) } c,N t
![Page 15: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/15.jpg)
Parsing with the HVS model
P(nt|Ct-1)
1) POP 1 elements from the previous stack state, n =1
P(Ct[1]|Ct [2..Dt])
2) Push 1 pre-terminal semantic concept into stack
P(Wt|Ct)3) Generate the next word
PROTEININTERACTPROTEIN
SS
… with Spc98 and Tub4 …
INTERACTPROTEIN
SS
DUMMYINERACTPROTEIN
SS
![Page 16: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/16.jpg)
Train using EM and apply constraints
Abstract semantic annotationPROTEIN (
INTERACT (PROTEIN) )
CUL-1 was found to interact with SKR-1, SKR-2, SKR-3, and SKR-7 in yeast two-hybrid system
Training text
Data Constraints
EM Parameter Estimation
HVS Model Parameters
Parse Statistics
Limit forward-backward search to only include states which are consistent with the constraints
![Page 17: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/17.jpg)
Reranking MethodologyReranking Methodology• Reranking approaches attempts to improve upon an
existing probabilistic parser by reranking the output of the parser.
• It has benefited applications such as name-entity extraction, semantic parsing and semantic labeling.
• To rerank parses generated by the HVS model for protein-protein interactions extraction
![Page 18: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/18.jpg)
Architecture Architecture
Annotated Corpus E
Test DataTraining
Training
SemanticParsing
RerankingReranking Model
Parse results
Ranked 1st parse
Extracted protein-protein
Interactions
HVS model
Parsing Information IPStructure Information ISComplexity Information IC...
Features:
![Page 19: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/19.jpg)
Reranking approaches Reranking approaches • Features for Reranking Suppose sentence Si has its corresponding parse set Ci = {Cij, j = 1,.. N}
– Parsing Information
– Structure Information
– Complexity Information
![Page 20: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/20.jpg)
Reranking approaches Reranking approaches Score is defined as• log-linear regression model
• Neural Network
• Support Vector Machines
![Page 21: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/21.jpg)
Experiments Experiments • Setup
– Corpus I• comprises of 300 abstracts randomly retrieved from
the GENIA corpus• GENIA is a collection of research abstracts selected
from the search results of MEDLINE database with keyword (MeSH terms) “human, blood cells and transcription factors”
• split into two parts:– Part I contains 1500 sentences (training data)
– Part II consists of 1000 sentences (test data)
![Page 22: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/22.jpg)
Experimental ResultsExperimental Results
Figure 1: F-measure vs number of candidate parses.
![Page 23: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/23.jpg)
Experimental Results Experimental Results (cont’d)(cont’d)
Experiments
Recall (%)
Precision (%)
F-Score (%)
Baseline 55.8 55.6 55.7SVMNNLLR
59.157.958.5
60.261.861.2
59.759.859.8
Table 3: Results based on the interaction category.
![Page 24: Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature](https://reader035.vdocuments.site/reader035/viewer/2022081520/568144a5550346895db16ea6/html5/thumbnails/24.jpg)
ConclusionsConclusions• Three reranking methods for the HVS model in the
application of extracting protein-protein interactions from biomedical literature.
• Experimental results show that 4% relative improvement in F-measure can be obtained through reranking on the semantic parse results
• Incorporating other semantic or syntactic information might be able to give further gains.