transport inference parser: inferring transport reactions from protein data for pgdbs

31
Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Upload: britton-oliver

Post on 26-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Transport Inference Parser:Inferring Transport Reactions from

Protein Data for PGDBs

Page 2: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Running the Transport Inference Parser

1. Run Pathway Tools.2. Make the organism of interest the current organism.3. [Run operon predictor].4. Select Tools/Pathologic.5. From Pathologic, select Refine/Transport Inference Parser.6. If running TIP for the first time on the organism, optionally

provide its aerobicity.7. Wait and observe progress.8. When complete, Probable Transporter Table window

appears.9. You may now review and modify the inferred transporters.

Page 3: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Background

Implemented in consultation with Ian Paulsen

Reference:

Annotation-based inference of transporter function. Thomas J. Lee, Ian Paulsen and Peter Karp. Bioinformatics, vol. 24, pp. 259-67, 2008.

Page 4: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Purpose of TIP

Infer transport reactions from protein data and construct them in BioCyc PGDBs.

Present results for review so that predictions can be reviewed for acceptance, rejection, and modification.

Page 5: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Results of running TIP

Add the following to the PGDB for each inferred transported substrate:– Transport-Reaction frame of correct subclass

• Assign compartments – use simple assumptions

– Enzymatic-Reaction frame linking protein to reaction

Construct Protein-Complexes as required

Evidence codes and provenance data added to these

Page 6: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Sequence of internal operations

1. Find candidate transporter proteins.2. Filter out candidates.3. Identify substrate(s).4. Assign an energy coupling to transporter.5. Identify compartment of each substrate.6. Group subunits of transporter complexes.7. Construct full compartmental reaction from

substrate and coupling.8. Construct enzymatic reaction linking each reaction

with protein.

Page 7: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

1. Find candidate transporter proteins

• Input: all protein frames of organism• Output: internal data structures for each candidate• Annotation must contain an indicator. Exs: "transport”,

“export”, “permease”, “channel”

• Exclude proteins with long annotations (default: 12 words)

Page 8: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

2. Filter candidates

• Exclude if annotation matches a list of regular expressions of counterindicator phrases and patterns

– Ex: “transport associated domain”

• Exclude if annotation contains counterindicator word– Exs: “regulator”, “nuclear-export”

Page 9: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

3. Identify substrate(s)

Search annotation for names of MetaCyc compounds. Details:

Multiple substrates indicate multiple reactions, symport/antiport pair, or both. Exs:

“cytosine/purines/uracil/thiamine/allantoin permease family protein”

“magnesium and cobalt transport protein cora, putative”“sodium:sulfate symporter transmembrane domain protein”“probable agcs sodium/alanine/glycine symporter”

Exclude non-substrates that look like compounds via an exception list. Exs: “as” “be” “c” “i”

Page 10: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

3. Identify substrate(s) (cont.)

Name canonicalization. Ex: strip plurals.

Affixed substrates. Exs: “-transporting” “-specific”

Lookup special ionic forms. Exs: “cuprous” “ferric” “hydrogen”

Resolve multivalent options using aerobicity. Exs: “FE” “CR” “MN”

Two-word substrates, substrate classes (no 3+ word substrates).Ex: “amino acid”

Page 11: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

4. Assign an energy coupling.

Couplings: Channel, Secondary, ATP, PTS, UnknownSearch annotation for prioritized list of indicators. Exs:

"atp-binding" => ATP "mfs" => SECONDARY "pts" => PTS "phosphotransferase" => PTS "carrier" => SECONDARY "channel" => CHANNEL

Some substrates imply a coupling. Ex: protoheme => ATP

Absence of indicator => UNKNOWN

Page 12: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

5. Identify compartment of each substrate.

Use keywords to determine compartment of primary substrate (Exs: “export”, “antiporter”)

Otherwise assume primary substrate is transported into cell (periplasm => cytoplasm)

Deferred complex compartment analysis:• Assume E.coli-like cellular structure

Page 13: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

6. Group subunits of transporter complexes.

Many transporters are systems of several proteins. These are grouped into complexes

Grouping criteria; all must be met:– Predicted coupling is ATP or PEP

– Predicted substrates are identical

– Genes of proteins have a common operon (NOTE requirement on operon availability)

Resulting complex is added to PGDB as a frame Protein-Complexes.

Page 14: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

7. Construct full compartmental reaction from substrate and coupling.

Determine set of transported substrates for this transporter:

• For SECONDARY coupling:– Identify auxiliary substrate providing ion gradient (H+, Na+)

– Remove from transported substrate list

– Place on side of reaction indicated by symport/antiport clues

• For other couplings:– Determined previously in substrate analysis

Page 15: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

7. Construct full compartmental reaction from substrate and coupling (cont).

For each transported substrate of this transporter, either import reaction (from E.coli) or to create new one.

1. Search import KB for reaction with matching substrates:(find-rxn-by-substrates)

– Transported substrate added with indicated compartment– Auxiliary substrates determined by coupling. Ex: – CHANNEL have none– ATP have ATP/H2O ADP/phosphate

2. If one reaction is found, import: (import-reactions trxns src-kb dst-kb …)

3. If multiple reactions found, retain all.4. Else if reaction is not present in PGDB, create new rxn

Page 16: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

7. Construct full compartmental reaction from substrate and coupling (cont).

Create new reaction:• Create reaction frame, subclass determined by coupling:

– (create-instance-w-generated-id rxn-class)

• Add transported and auxiliary substrates to appropriate sides of reaction

Page 17: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

8. Construct enzymatic reaction linking each reaction with protein.

For each created reaction:• (add-reactions-to-protein …)• Added evidence code, history string arguments• Subordinates new

[(import-reactions) handles import of enzymatic-reactions]

Page 18: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Running the Transport Inference Parser

1. Run Pathway Tools.2. Make the organism of interest the current organism.3. [Run operon predictor].4. Select Tools/Pathologic.5. From Pathologic, select Refine/Transport Inference Parser.6. If running TIP for the first time on the organism, optionally

provide its aerobicity.7. Wait and observe progress.8. When complete, Probable Transporter Table window

appears.9. You may now review and modify the inferred transporters.

Page 19: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

GUI Overview

1. Window is titled: Probable Transporter Table for Organism2. Table of inferred transporters is organized into columns:

– Status– Gene – Substrate– Coupling– Reaction / Function

3. Each row contains a transport reaction description:– Multiple reactions per transport protein are possible– Sort by Gene (the default) to keep together visually

4. Aggregate pane shows counts by status.5. Mousing over a reaction shows details in bottom pane.

Page 20: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Probable Transporter Table

Page 21: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Notional Probable Transporter Table

Status Gene Substrate Coupling Reaction /

AnnotationUn-reviewed T0059 Ca2+ SECONDARY Ca+2[c] + H+[p] =

Ca+2[p] + H+[c]

calcium/proton antiporter

Rejected T3669 phosphate ATP H2O + ATP + phosphate[p] = ADP + 2 phosphate[c]

phosphate transport atp-binding protein

Accepted T0080 Na+ CHANNEL Na+[p] = Na+[c]

sodium channel

Page 22: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Reviewing and Editing

• Left-click on a row – Dialog box appears

• May edit:– Function (name)– Energy coupling

• May invoke Reaction Editor on reaction• May retract reaction• May update status

Page 23: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

TIP Dialog

Page 24: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Transporter Status

• Unreviewed:

– Initial value of status

• Accepted:

– Preserves edits

– Incorporates transporter into PGDB upon save

• Rejected:

– Discard transporter upon save

Accept and Reject are undoable

Page 25: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Table row after rejection

Page 26: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Dialog after rejection

Page 27: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Filtering and Sorting

• Filtering excluded transporters from display: – Filter low- or high-confidence transporters (low-confidence

usually means ‘no substrate’)– Filter by status– Filter by number of reactions per substrate

• Sort transporters by columns like a spreadsheet: – Gene– Energy Coupling– Substrate number/name– Status (e.g., Accepted, Rejected)

Page 28: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Group Operations

TIP permits en masse acceptance or rejection of remaining predictions being shown:

Edit / Accept all Unreviewed predictions being shown

Edit / Reject all Unreviewed predictions being shown

Page 29: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Saving Your Work

TIP has made in-memory modifications to the PGDB; nothing is saved until exit from TIP.

Exit / Save

saves all predictions & edits.

Exit / Cancel

reverts to most recent save.

Must exit to save work!

Page 30: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Multisession Workflow

1. TIP remembers accepted predictions in the KB.

2. TIP remembers rejected transporters in a file under the organism directory.

3. To continue, re-run TIP and resume session.

4. If you don’t resume (i.e., start from scratch):– Will not re-predict Accepteds (they are in KB)

– Will re-predict Rejecteds

Page 31: Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs

Batch Mode

• TIP supports batch mode operation as well as interactive

• Run by BRG for all Tier 3 PGDBs (>3000 KBs)• To support both automated and user-controlled

operation:– Distinguish high- and low-confidence inferences

– Automated mode accepts all high-confidence inferences