constructing complex queries in pathway tools using emacs, lisp, and perl

39
Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl Randy Gobbel, Ph.D. May 14, 2003 [email protected]

Upload: cassie

Post on 07-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl. Randy Gobbel, Ph.D. May 14, 2003 [email protected]. Overview. Why would you need to write complex queries? Emacs Lisp perlcyc The GFP API, and Pathway Tools-specific functions Examples and exercises. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

Constructing Complex Queries

in Pathway Toolsusing Emacs, Lisp, and

Perl

Randy Gobbel, Ph.D.

May 14, 2003

[email protected]

Page 2: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsOverview

Why would you need to write complex queries?

Emacs

Lisp

perlcyc

The GFP API, and Pathway Tools-specific functions

Examples and exercises

Page 3: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsWhen do you need complex

queries?

Many common queries are accessible from the command menu

By name By substring By class Others are specialized by the type of the object being

displayedOther queries of arbitrary complexity can be

created by writing a (simple) program Example: find all reactions with more than 5 citations

Page 4: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsProgrammatic Access to PGDBs

LISP and PERL languages used for programmatic queries and updates to PGDBs

Generic Frame Protocol (GFP) is API for PGDBs

Page 5: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsEmacs

“The extensible, self-documenting editor”(Most of the time) typing a printing character

simply inserts it Just like most Windows and MacOS programs

Control and Meta keys in combination with other keys run commands

Again, just like keyboard shortcuts in most programsControl-H: Help

T -> tutorial, A -> apropos, W -> “where is <command>” K -> “what does this key combination do?”

Many commands are now available from pulldown menus

Page 6: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsEmacs

Three ways to run Pathway Tools from within Emacs Use the Emacs/Lisp interface provided with Allegro

Common Lisp (fi) Use the free ILisp package (wriitten in Emacs Lisp) Run Pathway Tools from a shell within Emacs Windows users: lowest-common-denominator

Cut and paste still works Advantages of using Emacs with Lisp

Syntax highlighting Automatic indentation One-keystroke evaluation of Lisp forms in fi and ilisp

Page 7: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp

An idea that keeps reinventing itself Function, arguments

What is a list? Unit of syntax: (a b c) Unit of data: (a b c) Unit of execution: (get-slot-value ‘arca ‘citations)

Most languages: function(arg1, arg2, …) Fine for writing

Lisp: (function arg1 arg2 arg3 …) Much easier to deal with in a computer

Page 8: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp Data Types

Numbers 1 1.325

Strings “hello”

Symbols E.g.: ARCA (or, arcA) Make a literal symbol by quoting it: ‘ARCA Case-sensitive symbols require vertical bars: ‘|Genes|

Special symbols: T and NIL Used to mean True and False NIL is also the empty list: ()

Page 9: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp Expressions and Evaluation

(+ 3 4 5) ‘+’ is a function (+ 3 4 5) is a function call with 3 arguments

Arguments are evaluated: Numbers evaluate to themselves If any of the args are themselves expressions, they are also evaluated (+ 1 (+ 3 4)) 8

The values of the args are passed to the function Some functions allow variable numbers of arguments

(+) 0 (+ 1) 1 (+ 2 3 1 3 4 5 6) 24

(+ (* 3 4) 6) 18

Page 10: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp Expressions and Evaluation

Also called “top level” and “read-eval-print loop” Uses a three-step process

Read Reader converts elements outside “” and || to uppercase

Evaluate Print

Anything you type in is evaluated 1 1 “hello” hello (+ 2 3) 5

Quoting prevents evaluation ‘(+ 2 3) (+ 2 3)

Setting a symbol to a value creates a variable: (setq foo ‘(a b c)) (a b c) foo (a b c) No declarations required!

Page 11: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsThe Lisp Listener

Useful forms in listener: Previous Results: *, **, *** But: not in programs

(+ 1 2) 3(+ 3 *) 6** 3

Page 12: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsDealing with the Lisp debugger

Error conditions result in a call to the Lisp debugger:

:continue continues, a numeric argument selects between possible options

Lower-numbered options generally take less drastic actions :reset unwinds to the top level

WARNING: may exit the Pathway Tools window! :zoom displays the stack

EC(4): (xxx)*debugger-hook* called.Error: Attempt to take the value of the unbound variable `X'. [condition type: UNBOUND-VARIABLE]

Restart actions (select using :continue): 0: Try evaluating X again. 1: Use :X instead. 2: Set the symbol-value of X and use its value. 3: Use a value without setting X. 4: Return to Top Level (an "abort" restart). 5: Abort entirely from this process.[1] EC(5): :res

Page 13: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp Variables

Global variable values can be set and used during a session

Declarations not needed

(setq x 5) 5x 5(+ 3 x) 8(setq y “atgc”) “atgc”

Page 14: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsEquality in LISP

Internally LISP refers to objects via pointers Fundamental equality operation is EQ

True if the two arguments point to the same object Very efficient

Other comparison operators: = for numbers: (= x 4) EQUAL for list structures or exact string matching: (equal x “abc”) STRING-EQUAL for case-insensitive string matching: (string-equal x

“AbC”) EQL for characters: (eql x #’\A) EQ for list structures or symbols (compares pointers): (eq x ‘ABC) FEQUAL for frames: (fequal x ‘trp)

Simple rule: Use EQUAL for everything except frames

Page 15: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsFunctions for Operating on Lists

length (length x) Returns the number of elements

first (first x) Returns the first element

nth (nth j x) Returns the Jth element of list X (element 0 is the first element)

Page 16: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsloop

Loop allows you to iterate Through a series of numbers

for i from 1 to 10 Through a list

for rxn in rxnsConditionals control whether execution continues

when (> (length (get-slot-values rxn ‘citations)) 5)

do lets you do something do (+ i total)

collect lets you gather up values collect (get-frame-name rxn)

Page 17: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsloop

You can combine as many loop clauses as you need:

(loop

for i from 1 to 10

for j from 10 downto 1

do (print (+ i j))

collect (* i j))

(10 18 24 28 30 30 28 24 18 10)

Page 18: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsDefining Functions

Put function definitions in a file Reload the file when definitions change

EC(1): :ld my-queries.lisp

(defun <name> (<arguments>) … code for function …)

Creates a new operation called <name>

Examples:(defun square (x) (* x x))

(defun message () (print “Hello”))

(defun test-fn () 1 2 3 4)

Page 19: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsAccessing Lisp from Pathway

Tools

Starting Pathway Tools for Lisp work:

> pathway-tools –lisp

EC(1): (select-organism :org-id ‘XXX)

Windows: pathway-tools-lisp.exe

Lisp expressions can be typed at any time to the Pathway Tools listener

Command: (get-slot-value ‘trp ‘common-name) “L-tryptophan”

Invoking the Navigator from Lisp:

EC(2): (eco)

Page 20: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsThe perlcyc API

Written by Lukas Mueller at TAIR Downloadable from the TAIR Web site Installs as a standard CPAN module From within Pathway Tools, start the server by hand:

(start-external-access-daemon) (start-external-access-daemon :verbose? t) for

tracing output Function names are the same as Lisp, with hyphens

replaced by underscores, question marks by _p get-class-all-instances get_class_all_instances coercible-to-frame? coercible_to_frame_p

Pathway Tools functions are callable as standard Perl functions

Frame names are symbols which can be passed back to Lisp

Control structures are standard Perl

Page 21: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsjavacyc

Uses the same Unix domain socket interface as perlcyc

Function names use Java conventions Get-slot-values getSlotValues

Includes a C library for Unix domain sockets

Page 22: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLisp vs. Perl

Task: find all reactions with fewer than 5 citations

Perl:use perlcyc;my $cyc = perlcyc->new(“ECOLI");my @found;foreach $r ($cyc->all_rxns()){ my @citations = get_slot_values($r, “citations”); if (scalar(@citations) < 5) { push @found, $r;}

Lisp:(loop for r in (all-rxns) when (< (length (get-slot-values r ‘citations)) 5) collect r)

Page 23: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsPathway Tools User Accessible

Functions

Internal Pathway Tools functions that users can call

Includes: Generic Frame Protocol (GFP), the Ocelot object database

API Additional functions specific to Pathway Tools

For more information see http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

Page 24: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsGeneric Frame Protocol (GFP)

A library of Lisp functions for accessing Ocelot DBs

GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html

A small number of GFP functions are sufficient for most complex queries

Page 25: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsGeneric Frame Protocol

(get-class-all-instances Class) Returns the instances of Class

Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites

Page 26: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsGeneric Frame Protocol

Note: Frame.Slot means a specified slot of a specified frame Frame and Slot must be symbols!

(get-slot-value Frame Slot) Returns first value of Frame.Slot

(get-slot-values Frame Slot) Returns all values of Frame.Slot as a list

(slot-has-value-p Frame Slot) Returns T if Frame.Slot has at least one value

(member-slot-value-p Frame Slot Value) Returns T if Value is one of the values of Frame.Slot

(print-frame Frame) Prints out the contents of Frame

Page 27: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsMore useful functions

(coercible-to-frame-p Thing) Returns T if Thing is the name of a frame, or a frame object

(save-kb) Saves the current KB

(replace-answer-list <list of frames>) Makes the specified frames browseable via the Pathway

Tools GUI

Page 28: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsGeneric Frame Protocol –

Update Operations

(put-slot-value Frame Slot Value) Replace the current value(s) of Frame.Slot with Value

(put-slot-values Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values

(add-slot-value Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any

(remove-slot-value Frame Slot Value) Remove Value from the current value(s) of Frame.slot

(replace-slot-value Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value

(remove-local-slot-values Frame Slot) Remove all of the values of Frame.Slot

Page 29: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformatics

Additional Pathway Tools Functions –Semantic Inference LayerSemantic inference layer defines built-in

functions to compute commonly required relationships in a PGDB

http://bioinformatics.ai.sri.com/ptools/ptools-fns.html

Page 30: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsGKB editor

GUI for browsing the frame hierarchy Command: Special Taxonomy Viewer

View Browse Class Hierarchy (ctrl-B)Allows viewing of classes, slots, and instances

You can’t write a query unless you know the exact class and slot names

Class names are usually case-sensitive symbols |Genes|, |Proteins|, …

Page 31: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsLISP and GFP References

Common LISP, the Language -- The standard reference

Paper edition by Guy Steele Online version

http://www.lispworks.com/reference/HyperSpec/Front/index.htm

Information on writing Pathway Tools queries: http://bioinformatics.ai.sri.com/ptools/ptools-resources.html http://www.ai.sri.com/pkarp/loop.html http://bioinformatics.ai.sri.com/ptools/debugger.html

Page 32: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsPathway Tools information Web

site

Top top-level page http://www.biocyc.org/

General Pathway Tools information http://bioinformatics.ai.sri.com/ptools/

How to submit a bug report http://bioinformatics.ai.sri.com/ptools/bug.html

Writing queries, introductions to Lisp, etc. http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

Page 33: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsExamples

(select-organism :org-id ‘ecoli) ECOLI

(setq genes (get-class-all-instances ‘|Genes|))

(……………)(setq monomers (get-class-all-instances ‘|Polypeptides|))

(…………….)(setq genes2 genes) (…………….)

Page 34: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsProblems

all-substrates

enzymes-of-reaction

genes-of-reaction

genes-of-pathway

monomers-of-protein

genes-of-enzyme

Page 35: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsExample Session

(setq x ‘trp) trp

(get-slot-value x ‘common-name) “L-tryptophan”

(setq aas (get-class-all-instances ‘|Amino-Acids|))

(……..)

(loop for x in aas count x) 20

Page 36: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsExample Session

(loop for x in genes for name = (get-slot-value x ‘common-name) when (and name (search “trp” name)) collect x)) (…)

(setq rxns (get-class-all-instances ‘|Reactions|)) (…)

(loop for x in rxns when (member-slot-value-p x ‘substrates ‘trp)

collect x) (…)(replace-answer-list *)

Page 37: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsExample Session

(setq x ‘(trp arg))

(TRP ARG)

(replace-answer-list x)

(TRP ARG)

(eco)

Page 38: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsHow to write a good bug report

Use dribble-bug (excl:dribble-bug “bug.txt”) to start dribbling (excl:dribble-bug) to stop

How to get out of the debugger :bt – short backtrace of what functions are being called :zoom – more detailed trace :cont <n> - continue. Lower numbers are less drastic

Be specific, and as detailed as you can stand What button/key did you push? Which screen/editor were you using at the time? What object were you viewing/editing?

Try to find a reproducible test case if at all possible!

Page 39: Constructing Complex Queries in Pathway Tools using Emacs, Lisp, and Perl

SRI InternationalBioinformaticsHow to use autopatch

Patches load automatically on startup, or--Special Install Patches

Download and install Or simply install

Goes to our Web server gets patches, and installs them

Restarting is usually not required Functions are redefined on the fly But: if the patch involved initialization, you might need to

restart