an introduction to python and its use in bioinformatics dr. nancy warter-perez april 19, 2005

33
An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

Post on 21-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

An Introduction to Python and Its Use in Bioinformatics

Dr. Nancy Warter-PerezApril 19, 2005

Page 2: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 2

Overview What is Bioinformatics? Overview of program/script development (PP

Ch3) Python Basics (PP Ch1) Python Types and Operators

Numbers and Arithmetic operators (PP Ch2) Strings (PP Ch4) Lists and Dictionaries (PP Ch5) Input & Output (PP Ch2)

Programming Workshop #1

Page 3: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 3

What is Bioinformatics? Fredj Tekaia at the Institut Pasteur

offers this definition of bioinformatics:"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."

Page 4: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 4

Classical Bioinformatics According to Damian Counsell from

bioinformatics.org“use computers to store, retrieve, analyze or predict the composition or the structure of biomolecules. As computers become more powerful you could probably add simulate to this list of bioinformatics verbs. "Biomolecules" include your genetic material---nucleic acids---and the products of your genes: proteins. These are the concerns of "classical" bioinformatics, dealing primarily with sequence analysis.”

Page 5: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 5

“New” Bioinformatics comparative genomics - look for

differences and similarities between all the genes of multiple species

functional genomics - identifying gene functions and associations

proteomics - catalogue the activities and characterize interactions between all gene products (in humans)

structural genomics - crystallize and or predict the structures of all proteins (in humans)

Page 6: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 6

Program DevelopmentProblem specification

Algorithm design

Test by hand

Code in target language

Test code / debug

Program/Script

Problem solving

Implementation

Page 7: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 7

What is Python? A portable, interpretive, object-

oriented programming language Elegant syntax Powerful high-level built-in data

types Numbers, strings, lists, dictionaries

Full set of string operations

Page 8: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 8

Why Python? Previously used C++ Scripting languages useful for

bioinformatics Perl is “bioinformatics standard” Python is more “robust” for larger

software projects

Page 9: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 9

Useful Tutorials DNA from the Beginning

http://www.dnaftb.org/dnaftb/ Python Tutorial

http://www.python.org/doc/current/tut/tut.html

Page 10: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 10

Python Development Open-Source Software

Python interpreter - will run on windows, you need to download it in two parts:1. The actual interpreter and core of python http://www.python.org/2.3.3/ (get the Python-2.3.3.exe file. There is a newer release (2.4.1) that you can download if you’d prefer.)

2. An integrated development environment for python called pythonwin, by Mark Hammond http://starship.python.net/crew/mhammond/win32/Downloads.html

Page 11: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 11

Python Basics - Comments Python comments

# line comment Header comments

#Description of program#Written by:#Date created:#Last Modified:

Page 12: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 12

Python Basics - Variables Python variables are not “declared”.

To assign a variable, just type: identifier=literal Identifiers

Have the following restrictions: Must start with a letter or underscore (_) Case sensitive Must consist of only letters, numbers or underscore Must not be a reserved word

Have the following conventions: All uppercase letters are used for constants Variable names are meaningful – thus, often multi-word (but not too

long) Convention 1: alignment_sequence (align_seq) Convention 2: AlignmentSequence (AlignSeq)

Python specific conventions (Avoid _X, __X__, __X, _)

Page 13: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 13

Numbers Numbers

Normal Integers –represent whole numbers Ex: 3, -7, 123, 76

Long Integers – unlimited sizeEx: 9999999999999999999999L

Floating-point – represent numbers with decimal places

Ex: 1.2, 3.14159,3.14e-10 Octal and hexadecimal numbers

Ex: O177, 0x9ff, Oxff Complex numbers

Ex: 3+4j, 3.0+4.0j, 3J

Page 14: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 14

Python Basics – arithmetic operations

+ add- subract* multiply/ divide% modulus/remainder

y=5; z=3x = y + z x = y – z x = y * z x = y / z x = y % z

x = 8x = 2x = 15x = 1x = 2

OperatorsExample

Page 15: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 15

Python Basics – arithmetic operations

<< shift left

>> shift right** raise to power

y=5; z=3x = y << 1 x = y >> 2 x = y ** z

x = 10x = 1x = 125

OperatorsExample

Page 16: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 16

Python Basics – Relational and Logical Operators

Relational operators== equal!=, <> not equal>greater than>= greater

than or equal

<less than<= less than or

equal

Logical operatorsand andor ornot not

Page 17: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 17

Python Basics – Relational Operators Assume x = 1, y = 4, z = 14

Expression Value Interpretation

x < y + z 1 True

y == 2 * x + 3

0 False

z <= x + y 0 False

z > x 1 True

x != y 1 True

Page 18: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 18

Python Basics – Logical Operators Assume x = 1, y = 4, z = 14

Expression Value Interpretation

x<=1 and y==3 0 False

x<= 1 or y==3 1 True

not (x > 1) 1 True

not x > 1 0 False

not (x<=1 or y==3)

0 False

Page 19: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 19

Enclosed in single or double quotesEx: ‘Hello!’ , “Hello!”, “3.5”, “a”, ‘a’

Sequence of characters:mystring=“hello world!”

mystring[0] -> “h” mystring[1] -> “e”

mystring[2] -> “l” mystring[-1] -> “!”

Strings

-1 is last,

-2 next to last, etc…

Page 20: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 20

String operations

mystring = “Hello World!”

Expression Value Purposelen(mystring) 12 number of characters in

mystring

“hello”+“world” “helloworld” Concatenate strings

“%s world”%“hello” “hello world” Format strings (like sprintf)

“world” == “hello”

“world” == “world”

0 or False

1 or True

Test for equality

“a” < “b”

“b” < “a”

1 or True

0 or False

Alphabetical ordering

Page 21: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 21

Strings (2) slicing:mystring = “spoon!”

mystring[2:] -> “oon!”mystring[:3] -> “spo” #note last element is never included!

mystring[1:3]-> “po” Many useful built-in functions

mystring.upper() -> “SPOON!” mystring.replace(‘o’, ‘O’) -> “spOOn!”

Page 22: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 22

Strings (3) “%” operator:

sort of “fill in the blanks” operation:mystring=“%s has %d marbles” % (“John”,35)

mystring -> “John has 35 marbles”

%s replace with string %d,%i replace with integer %f replace with float

Values to put in blanks

“blanks”

Page 23: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 23

Lists

mylist=[“a”,”b”,3.58,”d”,4,0]mylist[0]mylist[2]

a3.58

Indexing

mylist[-1]mylist[-2]

04

Negative indexing (counts from end)

mylist[1:4] [“b”,3.58,”d”] Slicing (like strings)

“b” in mylist“e” not in mylist

1 or True1 or True

mylist.append(8) [“a”,”b”,3.58,”d”,4,0,8]

Add to end of list

Page 24: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 24

Tuples Tuples – sequence of values

like lists, but cannot be changed after it is createdmytuple=(1,”a”,”bc”,3,87.2)mytuple[2] -> “bc”

mytuple[1]=“3” Used when you want to pass several

variables around at once

Error!

Page 25: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 25

Dictionaries Dictionaries – map ‘keys’ to ‘values’

like lists, but indices can be of any type Also, keys are in no particular order Eg:mydict={‘b’:3, ’a’:4, 75:2.85}mydict[‘b’] -> 3mydict[75] -> 2.85mydict[‘a’] -> 4

Page 26: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 26

Dictionaries

mydict={“r”:1,”g”:2,”y”:3.5,8.5:8,9:”nine”}mydict.keys() ['y', 8.5, 'r', 'g', 9] List of the keys

mydict.values() [3.5, 8, 1, 2, 'nine'] List of the values

mydict[“y”] 3.5 Value lookup

mydict.has_key(“r”) True or 1 Check for keys

mydict.update({“a”:75})

{8.5: 8, 'a': 75, 'r': 1, 'g': 2, 'y': 3.5, 9: 'nine'}

Add pairs to dictionary

Page 27: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 27

Dictionaries – other considerations Slicing not allowed Referencing invalid key is an error:>>> mydict={8.5: 8, 'a': 75, 'r': 1, 'g': 2, 'y':

3.5, 9: 'nine'}>>> mydict["red"]Traceback (most recent call last):

File "<interactive input>", line 1, in ?KeyError: 'red‘

Use mydict.get(“red”) instead, it returns None if key is not found

Page 28: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 28

Input/Output Function raw_input() designed to read a line of

input from the user 1 optional argument: string to prompt user If int or float desired, simply convert string:

int(mystring)->convert to int (if possible)

float(mystring)->convert to float (if possible)

>>> mystr=raw_input("Enter a string:")Enter a string:Hello World!>>> mystr'Hello World!'

Page 29: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 29

Output Function print

Prints each argument, followed by space

After all arguments, prints newline

Put comma after last arg to prevent newline

“add” strings to avoid spaces

print “a”,”b”,”c”a b c

print “a”,”b”,”c”,a b c

print “a”+”b”+”c”abc

Newline!

No Newline!

No spaces!

Page 30: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 30

Output Example>>> print "hello","world";print "hello","again"

hello world

hello again

>>> print "hello","world",;print "hello","again"

hello world hello again

>>> print "hello %s world" % "cold and cruel"

hello cold and cruel world

>>> print "hello","cold"+ " " + "and","cruel","world"

hello cold and cruel world

Page 31: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 31

Creating a Python Program Enter your program in the editor

Notice that the editor has a color coding Comments Key words Etc…

Also notice that it automatically indents Don’t override!! – this is how python tells when

block statements end! If doesn’t indent to proper location – indicates bug

Page 32: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 32

Running your Program To build your program

Under File->Run… Select No Debugging in the drop-down

window Fix any errors, then run again

Page 33: An Introduction to Python and Its Use in Bioinformatics Dr. Nancy Warter-Perez April 19, 2005

4/19/05 Introduction to Python 33

Programming Workshop #1

Write a Python program to compute the hydrophobicity of an amino acid

Amino Acid Hydrop. VALUEA 1.8C 2.5D -3.5E -3.5F 2.8G -0.4H -3.2I 4.5K -3.9L 3.8M 1.9N -3.5P -1.6Q -3.5R -4.5S -0.8T -0.7V 4.2W -0.9Y -1.3

Program will prompt the user for an amino acid and will display the hydrophobicity