web science python notes albert esterline spring 2015
TRANSCRIPT
Web SciencePython Notes
Albert Esterline
Spring 2015
Installation and Program Execution Usually install Python in
C:\Python27
The executable, python.exe, is there So add this to your PATH environment variable
Invoke the interactive interpreter
by clicking the Python item in the pop-up menu you get by clicking Python2.7 in the All Programs menu accessed from the Start menu
by typing python in the command window
Python program files have extension py
Can write them with your favorite text editor
In Windows, no need to start program with any special directives E.g., no need for UNIX’s “#!”
To execute a program from the command window, go to its folder and type its name
Can also type python followed by its name
On the command line, use < to redirect input, > to redirect output
In interactive mode, prompts for next command with the primary prompt, “>>>”
For continuation lines, prompts with the secondary prompt, “...”
Continuation lines are needed when entering a multi-line construct—e.g.,
>>> the_world_is_flat = 1
>>> if the_world_is_flat:
... print "Be careful not to fall off!“ # Initial space needed
...
Be careful not to fall off!
Quick IntroNumbers Newline, not “;”, terminates a statement
Escape (\) the newline if necessary
Arithmetic is as in most languages
= for assignment
>>> width = 20
>>> height = 5*9
>>> width * height
900
A value can be assigned to several variables simultaneously—e.g.,
>>> x = y = z = 0
Mixed-type operands convert integer operand to floating point
In interactive mode, last printed expression is assigned to the
(effectively read-only) variable _
>>> tax = 12.5 / 100
>>> price = 100.50
>>> price * tax
12.5625
>>> price + _
113.0625
>>> round(_, 2)
113.06
Strings Can be enclosed in single quotes or double quotes:
>>> 'spam eggs'
'spam eggs'
>>> 'doesn\'t'
"doesn't"
>>> "doesn't"
"doesn't"
>>> '"Yes," he said.'
'"Yes," he said.'
>>> "\"Yes,\" he said."
'"Yes," he said.'
>>> '"Isn\'t," she said.'
'"Isn\'t," she said.'
String literals can span multiple lines in several ways
Continuation lines can be used:
>>> hello = "This is a rather long string containing\n\
... several lines of text just as you would do in C.\n\
... Note that whitespace at the beginning of the line is\
... significant."
>>> print hello
This is a rather long string containing
several lines of text just as you would do in C.
Note that whitespace at the beginning of the line is significant.
>>>
Strings can be surrounded by a pair of matching triple-quotes, """ or '''
End of lines needn’t be escaped but will be included in the string
>>> print """
... Usage: thingy [OPTIONS]
... -h Display this usage message
... -H hostname Hostname to connect to
... """
Usage: thingy [OPTIONS]
-h Display this usage message
-H hostname Hostname to connect to
>>>
Interpreter prints result of string operations just as they are typed for input
Concatenation is +, repetition *
>>> ('Help' + '! ') * 3
'Help! Help! Help! '
>>>
String literals next to each other are concatenated
Doesn’t work for arbitrary string expressions
>>> 'Help' '!\n'
'Help!\n'
>>> ('Help' * 3) '!\n'
File "<stdin>", line 1
('Help' * 3) '!\n'
^
SyntaxError: invalid syntax
>>>
+ is overloaded: addition and concatenation
Python doesn’t coerce mixed-type operands of +
>>> 3 + "4" An error
Use conversion functions
>>> str(3) + "4"
'34'
>>> 3 + int("4")
7
>>>
Strings can be indexed
Indices start at 0 (leftmost character)
No character type
A character is a length-one string
>>> "cat"[1]
'a'
>>>
Substrings can be specified with slice notation
str[low:hi] is the substring of str from index low to index hi-1
>>> "scatter"[1:4]
'cat'
>>>
Default for the low index is 0
For the high index, it’s the length of the sliced string
A slice index > the string’s length is replaced by that length
An out-of-range single-element index gives an error
If the low index is > the high index, the slice is empty
>>> wd = 'scatter'
>>> wd[:4]
'scat'
>>> wd[4:]
'ter'
>>> wd[4:10]
'ter'
>>> wd[2:1]
''
>>>
Can’t change a string by assigning to an indexed position or a slice—it’s immutable
For a negative index, count from the right
>>> wd[-1]
'r'
>>> wd[-2]
'e'
>>> wd[:-2]
'scatt'
>>> wd[-2:]
'er'
>>> wd[:-2] + wd[-2:]
'scatter'
>>>
Invariant: str[:i] + str[i:] is the same as str
An out-of-range negative slice index is truncated
Think of slice indices as pointing between characters
Left edge of the 1st character is numbered 0
Index of the right edge of the last character is the string’s length
c a t
0 1 2 3-3 -2 -1
Built-in function len() returns the length of its string argument
>>> len(wd)
7
Lists Several compound data types
Most versatile is the list:
comma-separated (possibly heterogeneous) items within […]
>>> a = [5, 10, 'cat', 'dog']
>>>
Lists can be indexed, sliced, concatenated, repeated
>>> a[-2]
'cat'
>>> a[:-2] + [15, 20]
[5, 10, 15, 20]
>>> 3*a[:2] + [60]
[5, 10, 5, 10, 5, 10, 60]
>>>
Unlike strings, lists are mutable
Can assign to individual elements>>> a[1] = 12
>>> a
[5, 12, 'cat', 'dog']
>>>
Can assign to slices
Replace items
>>> a[1:3] = [15,'bird']
>>> a
[5, 15, 'bird', 'dog']
>>>
Remove items
>>> a[1:3] = []
>>> a
[5, 'dog']
>>>
Insert items
>>> a[1:1] = [20, 'fish']
>>> a
[5, 20, 'fish', 'dog']
>>>
Insert a copy of itself at the beginning
>>> a[:0] = a
>>> a
[5, 20, 'fish', 'dog', 5, 20, 'fish', 'dog']
>>>
Replace all items with an empty list
>>> a[:] = []
>>> a
[]
>>>
len() also applies to a list:
>>> len([1, 2, 3])
3
>>>
Append a new item to the end
>>> a = [1, 2]
>>> a.append(3)
>>> a
[1, 2, 3]
>>>
Lists can be nested
>>> q = [2, 3]
>>> p = [1, q, 4]
>>> len(p)
3
>>> p[1][0] = 5
>>> q
[5, 3]
>>>
First Steps towards Programming print statement takes 1 or more comma-separated expressions
Writes their values separated by spaces (but no commas) Strings written without quotes
Multiple assignment Comma separated list of n > 1 variables on LHS, n expressions on RHS
Values of expressions simultaneously assigned to corresponding variables
>>> x, y = 1, 2
>>> print x, y
1 2
>>>
Swap values
>>> x, y = y, x
>>> print x, y
2 1
>>>
Any non-0 integer value is true and 0 is false (like C++/Java)
Type bool has 2 objects, True & False
Comparison operators as in C++/Java:
<, >, ==, <=, >=, !=
Can combine in familiar ways giving ternary relations—e.g.,
2 <= x <= 4
Logical operators: not, and, or (last 2 are shortcut)
Control statement ends with a ‘:’
(…) not needed around condition
Statements in its scope indented (no brackets)
Example: initial part of Fibonacci series>>> a, b = 0, 1
>>> while b < 10:
... print b,
... a, b = b, a+b
...
1 1 2 3 5 8
>>>
A trailing comma avoids outputting a newline
Lines in the scope of while must be explicitly tabbed or spaced in
Give a blank line telling interpreter we’re at the end of the loop
Do this in a file: E:\Old D Drive\c690f08\while.py
a, b = 0, 1
while b < 10:
print b,
a, b = b, a+b
Suggestion: Get Notepad++
Under the tab Language, select Python
Execution
E:\Old D Drive\c690f08>while.py
1 1 2 3 5 8
For simple input, use raw_input()
Optional string argument for a prompt
Example programx = int(raw_input("Enter an integer: "))
y = int(raw_input("Enter another integer: "))
print "The sum is ", x + y
Execution
E:\Old D Drive\c690f08>input.py
Enter an integer: 3
Enter another integer: 5
The sum is 8
E:\Old D Drive\c690f08>
Control Flowif-elif-else 0 or more elif clauses and an optional else—e.g.,
x = int(raw_input("An integer: "))
if x < 0:
y = -1
print "Negative,",
elif x == 0:
y = 0
print "Zero,",
else:
y = 1
print "Positive,",
print "y is ", y
Use if … elif … elif … in place of a switch statement
while condition: See above
break, continue, pass As in C++/Java,
break breaks out of the smallest enclosing for or while loop
continue continues with the next iteration of the loop
pass is the do-nothing statement (placeholder)
range() function range(n) returns a list from 0 to n -1 (increments of 1)
range(m, n), n > m, returns a list from m to n -1 (increments of 1)
range(m, n, inc) returns a list starting at m with increments of inc up to but not including n
If inc > 0, then n > m, otherwise n < m
for variable in sequence: On successive iterations, variable bound to successive elements in
sequence
sequence can be a string or list
>>> for x in "abc":
... print x,
...
a b c
>>> for x in range(3):
... print x,
...
0 1 2
>>> for x in range(10, 2, -2):
... print x,
...
10 8 6 4
>>>
Loop statements may have an else clause
Executed when the loop terminates by exhausting the list (for) or the condition becomes false (while)
But not when the loop is terminated by a break
E.g., check the integers 2-9 for primes
for n in range(2, 10):
for x in range(2, n):
if n % x == 0:
print n, 'equals', x, '*', n/x
break
else:
# loop fell through without finding a factor
print n, 'is a prime number'
Execution
E:\Old D Drive\c690f08>primes.py
2 is a prime number
3 is a prime number
4 equals 2 * 2
5 is a prime number
6 equals 2 * 3
7 is a prime number
8 equals 2 * 4
9 equals 3 * 3
E:\Old D Drive\c690f08>
Conditional Expressions Python allows expressions of the form
expr1 if cond else expr2
where
expr1 and expr2 are arbitrary expressions and
cond evaluates to True or False (or values that can be interpreted as True or False)
If cond is True, value of the entire expression is the value of expr1
If cond is False, the expression’s value is the value of expr2
E.g.,
>>> x = 5
>>> 2 if x > 5 else 4
4
Typically assign the value of a conditional expression to something (e.g., a variable)
E.g., the following sets x to 0 if its below the threshold and to twice its value if it’s above
>>> threshold = 10
>>> x = 11
>>> x = 0 if x < threshold else 2 * x
>>> x
22
FunctionsFunction definition
def name(formalParameterList):
body
formalParameterList is a comma-separated list of identifiers
No type or passing mechanism info
body is indented
Optionally starts with a document string (docstring), a string literal Some tools use docstrings to produce documentation Put docstrings at various places in your code
Function callname(actualParameterList)
actualParameterList is a comma-separated list of expressions
Example programdef fib(n): # write Fibonacci series up to n
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while b < n:
print b,
a, b = b, a+b
# Now call the function we just defined:
fib(2000)
Function execution introduces a symbol table for local variables
All variable assignments in the function store the value in the local symbol table
But a variable reference looks first in the local symbol table then in the global symbol table then in the table of built-in names
So global variables may be referenced in a function but not assigned to—
unless named in a global statement—e.g.,
global x, y
Actual parameters to a function call are introduced in the local symbol table when the function is called
So arguments are passed using call by value
But the value is an object reference (for non-scalars)
A function definition introduces the function name in the current symbol table
The value of the function name has a type for a user-defined function
>>> def foo(n):
... print n
...
>>> foo(3)
3
>>> foo
<function foo at 0x00AA5EB0>
This value can be assigned to another name
That name can then be used as a function—function renaming
>>> bar = foo
>>> bar(3)
3
Use a return statement to return control to the caller
If return has an operand, its value is returned as the value of the call
The return type of a function with no return operand is None
>>> print foo(2)
2
None
>>> def trivial(n):
... return n
...
>>> print trivial(2)
2
>>> trivial(2)
2
Example program Rewrite the Fibonacci function so it returns a list
def fib2(n): # return Fibonacci series up to n
"""Return a list containing the Fibonacci series up to n."""
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
return result
f100 = fib2(100) # call it
print f100 # write the result
Note: append() is a list method
Function definitions may be nesteddef aveSqr(m, n):
def ave(x, y):
return (x + y) / 2
val = ave(m, n)
return val * val
print aveSqr(3, 5)
Function definitions may be recursivedef fact(n):
if n == 0:
return 1
else:
return n * fact(n-1)
print fact(4)
Default Argument Values Call a function with fewer arguments than it is defined with
Formal parameters assigned default values in the parameter list
If a parameter has a default, all parameters to its right in the list must also have defaults
In a call, if a value is supplied for a default parameter, values must be supplied for all default parameters to its left
def sum3(x, y=2, z=3):
return x + y + z
print sum3(1), " ", sum3(1, 1), " ", sum3(1, 1, 1)
Outputs 6, 5, 3
Keyword Arguments Can call a function with keyword arguments of the form keyword=value
All positional (normal) arguments must occur before any keyword arguments
Associated by position with formal parameters
Keyword arguments can occur in any order
May or may not have defaults
def classInfo(name, instructor='Dr. Smith', TA='Fred'):
print name, 'is taught by ', instructor, " helped by ", TA
classInfo('cs333')
classInfo(TA="Igor", instructor='Dr.Doom', name='cs555')
Output
cs333 is taught by Dr. Smith helped by Fred
cs555 is taught by Dr.Doom helped by Igor
A dictionary (see below) in other languages is called a hash or associative array
A final formal parameter of form **name receives a dictionary with all keyword arguments except those corresponding to formal parameters
def classInfo1(name, **keywords):
print "Class: ", name
keys = keywords.keys()
for kw in keys:
print kw, ': ', keywords[kw]
classInfo1('cs666', instructor='Dr. Jones', TA='Al', student='Jim')
Output
Class: cs666
instructor : Dr. Jones
student : Jim
TA : Al
Arbitrary Argument Lists Specify that a function can be called with arbitrary number of arguments
Final formal parameter of form *name
Binds name to a tuple (a kind of sequence—see below) of all argument values after those corresponding to formal parameters
def arbNum(label, *nums):
print label
for x in nums: print x,
arbNum('Cars per day', 1, 4, 2, 6)
Output
Cars per day
1 4 2 6
Unpacking Argument Lists Arguments already in a sequence but must be unpacked
Use the * operator
>>> args = [3, 6]
>>> range(*args)
[3, 4, 5]
A dictionary can be unpacked to give keyword arguments using the ** operator
def foo(x, y=1, z=2):
return x + y + z
dict = {"z": 5, "y": 10}
print foo(15, **dict)
Outputs 30
Lambda Forms Small anonymous functions
Used wherever a function object is needed
lambda argList: expression
argList is a comma separated list of arguments
The value of expression is returned
def make_contains(pt):
return lambda low, hi: low <= pt <= hi
target = make_contains(5)
print target(4, 6), target(6, 8)
Outputs True False
Function Documentation Just after the first line of a function’s definition, put its docstring
First line is a concise description not including the function’s name
Then a blank line
Then one or more paragraphs describing the function’s calling conventions, side effects, etc.
The first non-blank line after the first line determines the indentation for the entire documentation
A function’s docstring is the value of its __doc__ property
def my_function():
"""Do nothing, but document it.
No, really, it doesn't do anything.
"""
pass
print my_function.__doc__
Execution
E:\Old D Drive\c690f08> doc.py
Do nothing, but document it.
No, really, it doesn't do anything.
Data StructuresMore on ListsList Methods Let L be a list, x an item (that could be in a list), i an index
Results of 1st 4 methods below can be done alternatively with slices
All but index() and count() modify the list itself
All but pop(), index(), and count() return None
append(x ): Add x to the end of the list
extend(L ): Extend the list by appending all the items in list L
insert(i, x ): Insert x at position in front of the item at index i
remove(x ): Remove 1st item x from the list—error if no such item
pop(): Remove and return the last item in the list
pop(i ): Remove and return the item at index i
index(x ): Return index of 1st tem with value x—error if none
count(x ): Return the number of times x appears in the list
sort(): Sort the list, in place
reverse(): Reverse the list, in place
>>> L = [2,7,3,9,5]
>>> L.append(6)
>>> L
[2, 7, 3, 9, 5, 6]
>>> L.sort()
>>> L
[2, 3, 5, 6, 7, 9]
Use append(x) and pop() to implement a stack
Use append(x) and pop(0) to implement a queue
Functional Programming Tools filter(function, sequence) returns the sequence of each
item from sequence for which function(item) is true
>>> def in_range(x): return 5 <= x <= 10
...
>>> filter(in_range, [1,14, 6, 9, 4, 7, 12])
[6, 9, 7]
If the function is used only once, no need to give it a name
Use a lambda form
>>> filter(lambda x: 5 <= x <= 10, [1,14, 6, 9, 4, 7, 12])
[6, 9, 7]
map(function, sequence) calls function(item) for each item in sequence
Returns a list of the return values
>>> map(lambda x: x*x, range(6))
[0, 1, 4, 9, 16, 25]
If more than one sequence is passed
function must have as many arguments as there are sequences
the sequences must have the same length
function is called with the corresponding items from each sequence
>>> map(lambda x,y: x-y, [2,4,8,16], [1,2,3,4])
[1, 2, 5, 12]
reduce(function, sequence), where function is binary,
calls function on the 1st 2 items of sequence
then on the result and the next item
etc.
>>> reduce(lambda x,y: x+y, range(1, 11))
55
If sequence has just 1 item, it’s returned
If it’s empty, there’s an error
If a 3rd argument is passed to reduce, it’s the starting value
Then an empty sequence doesn’t give an error The 3rd argument is returned
>>> def my_sum(seq):
... return reduce(lambda x,y: x+y, seq, 0)
...
>>> my_sum(range(1, 11))
55
In fact, there’s a built-in function sum() that does this
>>> sum(range(1, 11))
55
List Comprehensions A concise way to create lists without using map(), filter()
Simple form
[ expression for variable in sequence ]
Forms a list of values of expression for successive items in sequence
>>> [x*x for x in range(11)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Add an if clause to restrict which items in sequence participate
>>> [x*x for x in range(11) if x % 2 == 0]
[0, 4, 16, 36, 64, 100]
Possibly several for clauses, each possibly with an if clause
expression contains the variable from each for clause
for clauses separated only by whitespace
With 2 for clauses: Each value of the leftmost for variable is paired in turn with
successive values of the variable in the for clause to its right Nested loop
Generalization to n for clauses obvious
>>> [x+y for x in [5, 10] for y in range(1, 11) if y % 2 == 0]
[7, 9, 11, 13, 15, 12, 14, 16, 18, 20]
Tuples Tuple is another sequence data type (besides string and list)
Comma-separated list of values enclose in (…)
Tuples can be indexed, sliced, concatenated, repeated
>>> t = (1, 2, 3)
>>> t
(1, 2, 3)
>>> t[0]
1
>>> t[:2]
(1, 2)
>>> t + (4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> t * 3
(1, 2, 3, 1, 2, 3, 1, 2, 3)
But tuples are immutable
Tuples can be nested and have lengths
>>> u = ((1, 3), (2, 4))
>>> u[0][1]
3
>>> len(u)
2
Because of tuple packing, can omit the (…) around tuple values on the RHS of an assignment
>>> t = 1, 2, 3
>>> t
(1, 2, 3)
To pack a singleton tuple, follow the value with a comma (else we have a normal assignment)
Singleton tuples are always displayed with a comma after the value
>>> s = 12,
>>> s
(12,)
>>> len(s)
1
Because of sequence unpacking, can assign a tuple (or other sequence) of n elements to n variables (separated by commas)
>>> x, y, z = t
>>> print x, y, z
1 2 3
Lists and strings can also be unpacked
>>> x, y, z = [4, 5, 6]
>>> print x, y, z
4 5 6
>>> x, y, z = "cat"
>>> print x, y, z
c a t
But only tuples can be packed
Multiple assignment is just tuple packing with sequence unpacking
Sets A set is an unordered collection with no duplicates
Form by applying constructor set() to a sequence
Displayed as set() applied to a list (no matter how formed) Display sorts the elements
>>> set([2, 3, 1])
set([1, 2, 3])
>>> set("cat")
set(['a', 'c', 't'])
Or enclose the elements within {…}, separated by commas
>>> {2,3,1}
set([1, 2, 3])
Sets, being unordered, don’t support indexing or slicing
Subset relation, <=
>>> {1,3} <= {1,2,3}
True
Sets are equal iff they have the same members (regardles of order)
>>> {1,3,2} == {1,2,3}
True
Use in for membership
>>> 2 in {1,2,3}
True
The negation of in is not in
>>> 4 not in {1,2,3}
True
Sets are mutable
Methods add() and remove()
>>> s = {1,2,3}
>>> s.add(4)
>>> s.remove(2)
>>> s
set([1, 3, 4])
Eliminate duplicates in a list (possibly reordering elements)
>>> list(set([1,2,3,2,1,0]))
[0, 1, 2, 3]
Set operators
union (|)
intersection (&)
difference (-)
symmetric set difference (^, analogous to XOR)
>>> s1, s2 = {1,2}, {2,3}
>>> s1 | s2
set([1, 2, 3])
>>> s1 & s2
set([2])
>>> s1 - s2
set([1])
>>> s1 ^ s2
set([1, 3])
Dictionaries “Hashes” or “associative arrays”
Indexed by keys (i.e., “keyed”)
Key must be immutable Strings and numbers OK A tuple is OK if its elements are immutable Lists never OK
A dictionary literal is a comma-separated list of key:value pairs (associations) within {…}
This is how dictionaries are displayed
Extracting a value using a non-existent key is an error
But add a pair by assigning to the keyed element Can’t extend a list in this way
>>> age = {'bob': 21, 'al': 34}
>>> age['ed'] = 28
>>> age
{'ed': 28, 'bob': 21, 'al': 34}
Delete a pair with del applied to a keyed element
>>> del age['al']
>>> age
{'ed': 28, 'bob': 21}
The keys() dictionary method returns a list of all its keys
Sort this list in place with the sort() list method
>>> names = age.keys()
>>> names.sort()
>>> names
['bob', 'ed']
To check whether a key is in the dictionary, use the has_key(key) method or the in keyword
>>> age.has_key('bob')
True
>>> 'al' in age
False
dict() builds a dictionary from a list of pairs stored as tuples
>>> weight = dict([('bob', 175), ('ed', 250)])
>>> weight
{'ed': 250, 'bob': 175}
Use the form with keyword arguments when keys are strings
>>> height = dict(bob=6.1, ed=5.5)
>>> height
{'ed': 5.5, 'bob': 6.0999999999999996}
List comprehension is possible when the pairs form a pattern
>>> dict([(x, x**2) for x in range(1,5)])
{1: 1, 2: 4, 3: 9, 4: 16}
Looping Techniques Looping through dictionaries, retrieve the key and associated value
together using the iteritems() method.
>>> for p, w in weight.iteritems():
... print p, w
...
ed 250
bob 175
Looping over a sequence, retrieve the index and corresponding value together using the enumerate() function
>>> for i, p in enumerate(['bob', 'ed']):
... print i, p
...
0 bob
1 ed
To loop over multiple sequences together, pair corresponding (by position) items with function zip()
>>> players = ['bob', 'ed', 'al']
>>> scores = [70, 50, 100]
>>> for p, s in zip(players, scores):
... print p, s
...
bob 70
ed 50
al 100
Function sorted(list) returns a sorted version of list without changing list
Recall: method list.sort() sorts list in place
Loop over a list in sorted order:
>>> for n in sorted([5, 1, 4, 2, 3]):
... print n,
...
1 2 3 4 5
Use function reversed() to reverse a list (and loop over it)
>>> for i in reversed(range(1,10,2)):
... print i,
...
9 7 5 3 1
There’s a method list.reverse() reversing list in place
Relations on Sequences We’ve seen operators in and not in with sets and (applied to keys)
dictionaries
Also used with sequences
>>> 2 in (1, 2, 3)
True
>>> 4 not in [1, 2, 3]
True
>>> 'a' in 'cat'
True
Sequence objects compared to other objects of same sequence type
Uses lexicographical ordering: Compare the 1st elements If a tie, compare the 2nd If still a tie, compare the 3rd Etc.
If one sequence is an initial sub-sequence of the other, shorter is earlier in the order
>>> (2, 3) > (2, 2)
True
>>> [1, 2, 3] < [1, 3, 3]
True
String comparison uses ASCII ordering for individual characters
Lowercase letters are in alphabetical order in ASCII as are uppercase letters
But all uppercase letters occur before any lowercase
>>> 'cat' < 'dog'
True
>>> 'cat' < 'Dog'
False
Infix operator is checks whether its operands reference the same object
== just checks that content is the same So p is q implies p == q but not vice versa
is not is the negation of is
>>> p = [1, 2, 3]
>>> q = p
>>> r = [1, 2, 3]
>>> p == r
True
>>> p is r
False
>>> p is not r
True
>>> p == q
True
>>> p is q
True
Here p and q reference the same object
So a change to p is ipso facto a change to q and vice versa
>>> p[1] = 4
>>> q
[1, 4, 3]
SortingSorting Lists of Dictionaries "the Old Way"Suppose we have a list of dictionaries such as
[{'key3': 5, 'key2': 1, 'key1': 4},
{'key3': 3, 'key2': 3, 'key1': 2},
{'key3': 2, 'key2': 2, 'key1': 5}]
Suppose this is assigned to variable undecorated
Sorting this list on the value of key2 should give
[{'key3': 5, 'key2': 1, 'key1': 4},
{'key3': 2, 'key2': 2, 'key1': 5},
{'key3': 3, 'key2': 3, 'key1': 2}]
Work with a list not of dictionaries but of 2-element tuples
A tuple contains the value associated with key2 in a given dictionary followed by that dictionary This list is said to be decorated
Method sort() for lists sorts the list of tuples on their 1st elements
The 2nd elements (the dictionaries) get carried along in the sort
The dictionaries end up in the intended order but as the 2nd elements in tuples
Use list comprehension to get the decorated list.
decorated = [ (dct["key2"], dct) for dct in undecorated ]
The value of decorated is now
[(1, {'key3': 5, 'key2': 1, 'key1': 4}),
(3, {'key3': 3, 'key2': 3, 'key1': 2}),
(2, {'key3': 2, 'key2': 2, 'key1': 5})]
Next,
decorated.sort()
so decorated becomes
[(1, {'key3': 5, 'key2': 1, 'key1': 4}),
(2, {'key3': 2, 'key2': 2, 'key1': 5}),
(3, {'key3': 3, 'key2': 3, 'key1': 2})]
Now extract the list of dictionaries from the list of tuples while maintaining their order
Again use list comprehension
[ dct for (key, dct) in decorated ]
The result is
[{'key3': 5, 'key2': 1, 'key1': 4},
{'key3': 2, 'key2': 2, 'key1': 5},
{'key3': 3, 'key2': 3, 'key1': 2}]
Sorting More Generally Function sorted() takes a list (or other sequence, coercing it to a list)
Returns the sorted version of the list (without modifying the original)
List method sort() applied to a list sorts the list in place
No such method for tuples or strings—immutable
Henceforth, discuss only function sorted()
All that’s said about sorted() also applies to method sort()
Sequences are sorted by lexicographic order—e.g.,
>>> sorted("example")
['a', 'e', 'e', 'l', 'm', 'p', 'x']
Sequences of sequences are sorted in lexicographic order—e.g.,
>>> sorted(["dog", "cat", "bird"])
['bird', 'cat', 'dog']
>>> sorted([(2,1), (2,2), (1,2)])
[(1, 2), (2, 1), (2, 2)]
Works on partial orders
Recall: For sets A and B, A < B is true if A is a proper subset of B, A <= B if A is a subset of B
>>> sorted([{1,2,3}, {2,3}, {2,1}])
[set([2, 3]), set([1, 2]), set([1, 2, 3])]
A reverse parameter with a boolean value (default False) lets us specify a sort in reverse order—e.g.,
>>> sorted([(2,1), (2,2), (1,2)], reverse=True)
[(2, 2), (2, 1), (1, 2)]
Parameter key lets us specify a function with a single argument that returns a key to use for sorting purposes
The function is applied once to each element
Elements are sorted as per the order of their values for this function
>>> sorted(["abc", "ef"], key=len)
['ef', 'abc']
lower is a method of string objects
If we invoke it on the type/class, must pass an instance of str (for the value of self)
>>> str.lower("AbC")
'abc'
Use this as a key for sorting
>>> sorted(["Dog", "cat", "Cattle"], key=str.lower)
['cat', 'Cattle', 'Dog']
A common pattern is to sort complex objects using only their values at a given index—e.g.,
>>> student_tuples = [('john', 'A', 15), ('jane', 'B', 12),\
('dave', 'B', 10)]
>>> sorted(student_tuples, key=lambda student: student[2])
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
The same techniques works for dictionaries
>>> student_recs = [dict(name='john', grade='A', age=15),\
dict(name='jane', grade='B', age=12),\
dict(name='dave', grade='B', age=10)]
>>> sorted(student_recs, key=lambda student: student['age'])
[{'grade': 'B', 'age': 10, 'name': 'dave'},
{'grade': 'B', 'age': 12, 'name': 'jane'},
{'grade': 'A', 'age': 15, 'name': 'john'}]
And for objects with named attributes (see below)
Sorting with operator Module Functions The key-function patterns shown above are very common
Module operator provides convenience functions to make accessor functions easier and faster
operator.itemgetter(index) returns a callable object that fetches the value at index index of its operand—e.g.,
>>> import operator as op
>>> ls = [1,2,3]
>>> op.itemgetter(1)(ls)
2
Using this to sort the above list of tuples
>>> sorted(student_tuples, key=op.itemgetter(2))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
operator.itemgetter(key) also works with dictionaries
>>> op.itemgetter('b')({'a':1, 'b':2, 'c':3})
2
Use this to sort the above list of dictionaries on key ‘age’
>>> sorted(student_recs, key=op.itemgetter('age'), reverse=True)
[{'grade': 'A', 'age': 15, 'name': 'john'},
{'grade': 'B', 'age': 12, 'name': 'jane'},
{'grade': 'B', , 'age': 15, 'name': 'dave'}]
operator.attrgetter(attr) returns a callable object that fetches the value of attribute attr from its operand (an object)
See below, after we've introduced classes
The operator module functions allow multiple levels of sorting
E.g., sort student_tuples with index 1 as the primary key and index 2 as the secondary key
I.e., sort on index 1 and break ties with index 2
>>> sorted(student_tuples, key=op.itemgetter(1,2))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
E.g., sort student_recs with 'grade' as the primary key and 'age' as the secondary key
>>> sorted(student_recs, key=op.itemgetter('grade', 'age'))
[{'grade': 'A', 'age': 15, 'name': 'john'},
{'grade': 'B', 'age': 10, 'name': 'dave'},
{'grade': 'B', 'age': 12, 'name': 'jane'}]
operator.methodcaller(name) returns a callable object that calls the method name on its operand—e.g.,
>>> op.methodcaller('pop')([1,2,3])
3
Parameter name may be followed (after a comma) by a comma-separated run of arguments to method name
The arguments become arguments of the callable object
>>> op.methodcaller('count',2)([1,2,3,2,4,2])
3
As a sophisticated example, we sort in reverse order a list of 4-tuples of integers on the number of -1’s in them
>>> marks = [(4,-1,5,3), (5,6,7,8), (5,-1,6,-1)]
>>> sorted(marks, key=op.methodcaller('count', -1),\ reverse=True)
[(5, -1, 6, -1), (4, -1, 5, 3), (5, 6, 7, 8)]
Output Formatting For output formatting, you can roll your own with string slicing and
concatenation
Or use the % (formatting or interpolation) operator with
a format string as the left operand and,
as the right operand, an expression or tuple of expressions whose values are substituted into the format string
Returns the string resulting from substituting the values from the right operand into specified positions in the left operand
This derives from formatting output in C
General form
format % values
Conversion specifiers in format are character sequences beginning with %
If there’s 1 conversion speicifer
Then values is a single expression whose value is converted to a string (as if by str()) and substituted for the conversion specifier to given the resulting string
If there are n > 1 conversion specifiers in format
Then values is a tuple of n expressions
Resulting string is produced by converting the values of the expressions to strings and substituting each for corresponding conversion specification in
format Correspondence by position
E.g., %s specifies a string and %d specific a decimal integer
>>> "The number is %d" % 5
'The number is 5'
>>> "%s is %d years old" % ('Fred', 21)
'Fred is 21 years old'
Normally, string interpolation is done for well-formatted output
>>> print "%s is %d years old" % ('Fred', 21)
Fred is 21 years old
A conversion specifier contains at least % followed by a conversion type
Some conversion typess String
c Single character (accepts integer or single character string)
d Signed integer decimal
f (or F) Floating point decimal
e (resp, E) Floating point exponential with lowercase (resp., uppercase) ‘e’
g (or G) Floating point decimal or exponential (depending on the magnitude of the number and the specified precision)
>>> "%f %e %g %g" % (2.5, 2.5, 250000, 2500000)
'2.500000 2.500000e+000 250000 2.5e+006'
Often want to control width of the field a value is substituted into
Give the min. field width between the % and the conversion type
If the value requires more space, it takes it (thus “min.”)
>>> "%5s %5d %3s" % ('cat', 12, 'elephant')
' cat 12 elephant'
By default, values are right justified
Without a min. field width specified, value occupies just as much as needed
For floating point values, specify a precision just after the minimum field width
A period (‘.’) followed by the number of places to show after the decimal point
The decimal places and decimal point contribute to the field width
>>> "%5.2f %5.1e" % (25.4, 25.6)
'25.40 2.6e+001'
Field width is important when values are output in tabular form
For free-form text, usually let the values occupy their own space (no min. field width specified)
But usually specify precision of floating-point values—default gives many more digits than desirable
To specify just the precision (and allow the overall floating-point value occupy just enough space),
use a conversion specifier with no min. field width but with a precision
Example
>>> x = 1.0/3
>>> "%.2f and %.2f" % (x, -2*x)
'0.33 and -0.67'
0 or more conversion flags may occur between % and the min. field width
Some conversion flags0 Conversion is 0 padded for numeric values
- Converted value is left adjusted (overrides "0" if both given)
(a space) A blank is left before a positive number produced by a signed conversion
+ A sign character ("+" or "-") precedes conversion (overrides a "space" flag)
>>> "%-5d %0+5.1f" % (12, 3.5)
'12 +03.5'
The New, Pythonic Way to Format The format string method was added in Python 2.6
General form
template.format(p0, p1, ..., k0=v0, k1=v1, ...)
The template (or format string) is a string with 1 or more format codes (fields to be replaced) embedded
The fields to be replaced are surrounded by {}
The curly braces and "code" inside are replaced with a formatted value from 1 of the arguments
Anything not contained in {} is literally printed
If a brace character, { or }, has to be printed, it is escaped by doubling it: {{ and }}
The list of arguments for format() starts with 0 or more positional arguments (p0, p1, ...)
followed by 0 or more keyword arguments, name=value
A positional parameter is accessed by placing the index of the parameter after the opening brace
{0} accesses the 1st parameter, {1} the 2nd, …
The index can be followed by a colon and a format string of the form used in the old formatting way—e.g., {0:5d}
If the positional parameters are used in the order in which they’re written and no formatting is specified,
positional argument specifiers inside the braces can be omitted
E.g., '{} {} {}' corresponds to '{0} {1} {2}'
They’re needed if they’re accessed in a different order—e.g., '{2} {1} {0}'—or formatting is specified
Examples>>> "Product: {0:6s}, Price per unit: ${1:5.2f}".format('Milk', 5.23)
'Product: Milk , Price per unit: $ 5.23'
>>> "Price per unit: ${1:5.2f}, Product: {0:6s}".format('Milk', 5.23)
'Price per unit: $ 5.23, Product: Milk '
>>> "Product: {p:6s}, Price per unit: ${u:5.2f}".format(p='Milk', u=5.23)
'Product: Milk , Price per unit: $ 5.23'
For justifying, precede the formatting with a "<" (left justify, usually the default) or ">" (right justify)
Use "^" to have the value centered in the available space
Unless a min field width is defined, the field width is the same size as the data to fill it—alignment isn’t an issue
"+" specifies including positive and negative signs for numbers
"-" specifies sign only for negative numbers (the default)
" " (space) specifies space for positive numbers, a minus sign for negative numbers
"=" (for numbers) specifies 0 padding before the digits
A dictionary can be unpacked (using **) to provide keyword arguments for format()
Example
>>> capital_country = {"US" : "Washington",
... "Germany": "Berlin",
... "France" : "Paris",
... "UK" : "London"}
>>> format_string = ''
>>> for c in capital_country:
... format_string += c + ": {" + c + "}; "
...
>>> format_string
'Germany: {Germany}; UK: {UK}; US: {US}; France: {France}; '
>>> format_string.format(**capital_country)
'Germany: Berlin; UK: London; US: Washington; France: Paris; '
File Methods To open a file and return a file object (as a handle to it), use
open(filename, mode)
filename is a string with the pathname of the file (relative to the current folder)
mode (a string) is one of
′r′ to open the file for reading
′w′ for writing (erasing a file with the same name if it exists)
′a′ for appending data to the end of the file
′r+′ for reading and writing On Windows systems, ′b′ appended to the mode opens the
file in binary mode (e.g., for images)
File method read() reads the entire file, returns its contents as a string
Optional integer argument indicates the max. number of bytes to read If EOF has been reached, the empty string is returned
File method close() closes the corresponding file
File data.txt1
2
3
4
Program file file1.py (in same folder as data.txt)
f = open('data.txt', 'r')
s = f.read()
print s
f.close()
Run
E:\Old D Drive\c690f08>file1.py
1
2
3
4
f.readline() reads a single line from the file, returns it as a string
A newline character (\n) is left at the end of the string
So the returned string doesn’t end with a \n only if it’s the last line in the file and that line doesn’t end with a \n
So a blank line is returned as ′\n′ while EOF returns an empty string
f.readlines() returns a list of all lines in the file
An optional integer parameter specifies how many bytes to read Enough more to complete the last line returned
Use this so an entire large file needn’t be loaded into memory
f = open('data.txt', 'r')
ls = f.readlines()
print ls
f.close()
G:\c690f08>file2.py
['1\n', '2\n', '3\n', '4']
Convert values to integers
Newline presents no problem
…
ls1 = [ int(x) for x in ls ]
print ls1
…
G:\c690f08>file3.py
[1, 2, 3, 4]
A fast, memory-efficient alternate way to read lines is to loop over the file object
The 2 approaches manage line buffering differently
Don’t mixed them
f = open('data.txt', 'r')
for val in f:
print val,
f.close()
G:\c690f08>file4.py
1
2
3
4
Separate lines since \n is retained
File Output f.write(string) writes string to the file
An output value must first be converted to a string
f = open('out.txt', 'w')
age = {'Al': 25, 'Ed': 34, 'Bob': 21, 'Ken': 37}
for n, a in age.iteritems():
f.write( "%s\t%3d\n" % (n, a) )
f.close()
File out.txt
Ed 34
Bob 21
Al 25
Ken 37 [Blank last line]
Exceptions Exception: error detected during execution
>>> 2/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
Last line states what happened
Starts with exception’s type (ZeroDivisionError, built-in) Always output with built-in exceptions Recommended for user-defined ones
Rest of the line depends on the exception type and cause
Preceding part of message is a stack traceback indicating source lines
Handling Exceptions Simple try statement
try:
Try clause
except Exception:
Except clause
When the try clause is executed,
if no exception occurs, except clause is skipped
if an exception occurs, rest of the try clause is skipped
If the exception’s type matches Exception,
except clause is executed and
execution continues after the try statement
If not, the exception is passed on to outer try statements
If no handler is found, it’s an unhandled exception
Execution stops with a message (see above)
while True:
try:
x = int(raw_input("Please enter a number: "))
break
except ValueError:
print "Invalid number, try again...“
E:\Old D Drive\c690f08>try1.py
Please enter a number: a
Invalid number, try again...
Please enter a number: 2
Can interrupt the program with Ctrl-C
E:\Old D Drive\c690f08>try1.pyPlease enter a number: Ctrl-C Traceback (most recent call last):
File "E:\Old D Drive\c690f08\try1.py", line 3, in <module>
x = int(raw_input("Please enter a number: "))
KeyboardInterrupt
A try statement may have more than 1 except clause
At most 1 handler is executed.
sum = 0
while True:
try:
sum += int(raw_input("Enter an integer (Ctrl-C to exit): "))
except ValueError:
print "Invalid number, try again ..."
except KeyboardInterrupt:
break
print "\nThe sum is ", sum
C:\user\c690f08>try2.py
Enter an integer (Ctrl-C to exit): 4
Enter an integer (Ctrl-C to exit): a
Invalid number, try again ...
Enter an integer (Ctrl-C to exit): 6
Enter an integer (Ctrl-C to exit): Ctrl-C
The sum is 10
Class Exception is the root of the hierarchy of exception classes
An except clause may name multiple exceptions as a tuple
except (RuntimeError, TypeError, NameError):
pass
Last except clause may omit the exception name
Serves as a wildcard Use this cautiously
Can be used to print an error message then re-raise the exception Allows a caller to handle the exception
import sys
i = -1
try:
f = open('data.txt')
s = f.readline()
i = int(s.strip())
except IOError, (errno, strerror):
print "I/O error(%s): %s" % (errno, strerror)
except ValueError:
print "Could not convert data to an integer."
except:
print "Unexpected error:", sys.exc_info()[0]
raise
Here, sys.exc_info() returns a tuple (type, value, traceback) where type is the type of exception (a class object) value is the exception parameter (a class instance if the exception type is a class
object) traceback is an object encapsulating the call stack at the point of exception
The try statement has an optional else clause
After all except clauses
Useful for code executed if the try clause doesn’t raise an exception
import sys
for arg in sys.argv[1:]:
try:
f = open(arg, 'r')
except IOError:
print 'cannot open', arg
else:
print arg, 'has', len(f.readlines()), 'lines'
f.close()
Better than adding code to the try clause
Avoids accidentally catching an exception not raised by the code protected by the try statement
Exception Arguments and Raising Exceptions An exception may have an associated value (its argument)
Presence and type of the argument depend on the exception type
Pass a single argument to an exception
Binds it to the message value attribute
A tuple for multiple arguments
Separated from the message name by a comma
>>> try:
... '1'+2
... except TypeError, detail:
... print detail
...
cannot concatenate 'str' and 'int' objects
The raise statement lets you force a specified exception
Possibly with arguments
The operand must be a valid exception class
What we pass to it is up to us
>>> raise NameError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError
>>> raise NameError('Hi')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: Hi
>>> raise NameError('Hi', 'Bob')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: ('Hi', 'Bob')
Recall that, in an except clause, raise re-raises the exception
Exception handlers don't just handle exceptions occurring immediately in the try clause
Also handle exceptions inside functions called (even indirectly) in the try clause
>>> def this_fails():
... x = 1/0
...
>>> try:
... this_fails()
... except ZeroDivisionError, detail:
... print 'Handling run-time error: ', detail
...Handling run-time error: integer division or modulo by zero
Some Common Exception ClassesException
All user-defined exceptions should be derived from this class
ArithmeticError
Base class for built-in exceptions raised for arithmetic errors
LookupError
Base class for exceptions raised when a key or index used on a dictionary or sequence is invalid
AttributeError
Raised when an attribute reference or assignment fails
KeyError
Raised when a dictionary key isn’t found
NameError
Raised when a local or global name isn’t found Associated value is a message that includes the name not found
TypeError
Raised when an operation/function is applied to an object of inappropriate type
Associated value is a string giving details about the type mismatch
ZeroDivisionError
Raise these predefined exceptions to create meaningful error messages
E.g., suppose in function foo() we're trying to find the value of variable name in dictionary people
if name not in people:
raise LookupError(name + ' not present')
A call to foo() may be nested in a try statement
try:
foo()
except LookupError, detail:
print "In foo(), handling the grads, " + str(detail)
detail is actually an instance of LookupError and must be converted to a string
Defining Clean-up Actions The try statement has an optional finally clause for clean-up
actions executed under all circumstances
A finally clause is executed before leaving the try statement, whether or not an exception occurs
When
an exception occurs in the try clause and isn’t handled by an except clause or
it occurs in a except or else clause,
it’s re-raised after the finally clause is executed
The following is file try6.py in folder C:\Python27
def divide(x, y):
try:
result = x / y
except ZeroDivisionError:
print "division by zero!"
else:
print "result is", result
finally:
print "executing finally clause“
>>> from try6 import *
>>> divide(2,1)
result is 2
executing finally clause
>>> divide(2,0)
division by zero!
executing finally clauseContinued
>>> divide("2","1")
executing finally clause
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "try6.py", line 3, in divide
result = x / y
TypeError: unsupported operand type(s) for /: 'str' and 'str'
The TypeError raised by dividing 2 strings isn’t handled by the except clause So it’s re-raised after the finally clauses is executed
A finally clause is useful for releasing external resources—e.g., files or network connections— whether or not use of the resource succeeded
Predefined Clean-up Actions Some objects define standard clean-up actions for when the object
is no longer needed whether use of the object succeeded or failed
Consider
for line in open("myfile.txt"):
print line
This leaves the file open for an indeterminate amount of time after the code finishes executing
The with statement ensures that objects are cleaned up
with open("myfile.txt") as f:
for line in f:
print line
After this is executed, file f is always closed even if there’s a problem processing the lines
ClassesPython Name Spaces and Scopes A namespace is a mapping from names to objects
Most namespaces currently implemented as Python dictionaries—may change
Examples of namespaces
the set of built-in names
the global names in a module
the local names in a function invocation
No relation between names in different namespaces
An attribute is any name following a dot
In a sense, the set of attributes of an object form a namespace
References to names in modules are attribute references
In modname.funcname, modname is a module object funcname is an attribute of it
Name spaces are created at different moments, have different lifetimes
The local namespace for a function is created when it’s called
Deleted when the function returns or raises an exception not handled within the function
Recursive invocations each have their own local namespace
A scope is a textual region of a program where a namespace is directly accessible (unqualified names)
At any time during execution, the namespaces of at least 3 nested scopes are directly accessible
the innermost scope contains the local names Searched first for a name used in the code
the namespaces of any enclosing functions Searched starting with the nearest enclosing scope
the middle scope contains the current module's global names Searched next
the outermost scope is the namespace containing built-in names Searched last
If a name is declared global,
then all references and assignments go directly to the middle scope containing the module's global names
Otherwise, all variables outside innermost scope are read-only
Class Definition Syntax Simplest form of class definition:
class ClassName:
statement-1 ...
statement-N
Class definitions must be executed before having any effect
In practice, statements inside a class definition are function definitions
But other statements are allowed
When a class definition is entered, a new namespace is created and used as the local scope
When a class definition is left normally (via the end), a class object is created a wrapper around the contents of the namespace created by the
class definition
Class Objects Class objects support 2 kinds of operations: attribute references and
instantiation
Attribute references use the standard Python syntax for attribute references: obj.name
Consider the following class definition
class OurClass:
"A simple class example"
n = 36
def f(self):
return "Hello"
OurClass.n and OurClass.f are valid attribute references
Return an integer and a function object, resp.
4 of the 5 predefined attributes of classes (5th relates to inheritance)
__doc__ is the docstring belonging to the class
__dict__ (a dictionary) contains the class name space
__name__ is the name of the class
__module__ is the name of the module where this class is defined
User-defined class attributes can be assigned to
But __name__ (and __bases__, see below) are read-only
A function in the standard os module
chdir(path ): Change current working directory to path
>>> from os import *
>>> chdir("C:\user\c690f08")
>>> listdir('.')
['class1.py', 'class1.pyc', 'ClassNotes.doc', 'data.txt', ...]
>>> from class1 import OurClass
>>> OurClass.n
36
>>> OurClass.f
<unbound method OurClass.f>
>>> OurClass.__doc__
'A simple class example'
>>> OurClass.__dict__
{'__module__': 'class1', 'f': <function f at 0x00CFABB0>, '__doc__': 'A simple class example', 'n': 36}
>>> OurClass.__name__
'OurClass'
>>> OurClass.__module__
'class1'
>>> OurClass.n = 48
>>> OurClass.n
48
Class instantiation uses function notation
obj = OurClass()
creates a new instance of the class and
assigns this object to local variable obj
The instantiation operation creates an empty object
Often we want objects with customized initial states
So a class may define a special method __init__()
By convention, 1st argument of a method is called self
def __init__(self):
self.data = []
Then class instantiation automatically invokes __init__()
If __init__() has arguments after self, then arguments given to the class instantiation operator are
passed on to __init__()
Another special function name is __str__
In its definition, include the formal parameter self
Defines the string returned when an instance of the class is passed to str(), hence also how the instance is displayed by print
Instances have 2 predefined attributes
__dict__ is a dictionary containing the instance name space
__class__ is the class of this instance
class Person:
"A simple record of a person"
def __init__(self, first, last):
self.first_name = first
self.last_name = last
def __str__(self):
return self.first_name + " " + self.last_name
>>> from class1 import Person
>>> fred = Person("Fred", "Smith")
>>> fred
<class1.Person instance at 0x00D045D0>
>>> str(fred)
'Fred Smith'
>>> print fred
Fred Smith
>>> fred.__dict__
{'first_name': 'Fred', 'last_name': 'Smith'}
>>> fred.__class__
<class class1.Person at 0x00D063F0>
>>> fred.__class__.__name__
'Person'
Instance Objects Only operations understood by instance objects are attribute references
2 kinds of attribute names: data attributes and methods
Data attributes correspond to C++ ”data members” needn't be declared.
>>> fred.counter = 1
>>> while fred.counter < 5:
... fred.counter += 1
...
>>> print fred.counter
5
>>> del fred.counter
>>> fred.counter
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: Person instance has no attribute 'counter'
A class attribute that’s a function object defines a corresponding method of its instances
>>> obj = OurClass()
obj.f is a method reference since MyClass.f is a function
But obj.f isn’t the same thing as MyClass.f It’s a method object, not a function object
>>> OurClass.f
<unbound method OurClass.f>
>>> obj.f
<bound method OurClass.f of <class1.OurClass instance at 0x00AA9C60>>
Consider the following>>> class Foo:
... def bar(self):
... print "Hi"
...
>>> f = Foo()
>>> f.bar()
Hi
>>> Foo.bar(f)
Hi
Referencing Attributes inside and outside Method Definitions Data attributes may be referenced by methods as well as by clients
Python can’t enforce data hiding—it’s based on convention
Methods access data attributes and call other methods using attributes of the self argument
No shorthand for referencing data attributes or other methods from within a method Can’t confuse local variables and instance variables
class MyBag:
def __str__(self):
return str(self.data)
def __init__(self):
self.data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
self.add(x)
self.add(x)
>>> from BagClass import MyBag
>>> bg = MyBag()
>>> bg.add(1)
>>> bg.addtwice(2)
>>> print bg
[1, 2, 2]
Inheritance Syntax for a derived class definition
class DerivedClassName(BaseClassName ):
statement-1 ...
statement-N
Name BaseClassName must be defined in a scope containing the derived class definition
In place of a base class name, other expressions allowed
E.g., for a base class defined in another module:
class DerivedClassName(modname.BaseClassName ):
DerivedClassName() creates a new instance
__bases__ is a predefined attribute for classes
Contains a tuple of the classes from which this class inherits
If a requested attribute isn’t found in the class, look in the base class
Do so recursively if the base class itself is derived
A method reference is valid if this yields a function object
A derived class may thus override methods of its base classes
A method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it
A “new-style” class is one that’s derived from the predefined, top-level class object or one that inherits from such a class
New-style classes have advantages (e.g., they integrate seamlessly into Python’s type system)
We’ll henceforth use new-style classes
__init()__ is inherited
Don’t try to use super()
An overriding method may want to extend, not replace, the base class method
To call the base-class method directly,
BaseClassName.methodname(self, arguments)
This calls a function attribute of class BaseClassName
class Tax(object):
def __init__(self, inc, deps):
self.income = inc
self.dependents = deps
def state_tax(self):
return 0.07 * self.income if self.dependents > 3 else 0.1 * self.income
def county_tax(self):
return 0.03 * self.income if self.dependents > 3 else 0.05 * self.income
class PoorTax(Tax):
def county_tax(self):
return 0.02 * self.income
class RichTax(Tax):
def __init__(self, inc, deps, age):
self.income = inc
self.dependents = deps
self.age = age
def state_tax(self):
return 1.3 * Tax.state_tax(self) * (1.3 if self.age < 50 else 1.0)
An equivalent way to define __init__() for RichTax
def __init__(self, inc, deps, age):
Tax.__init__(self, inc, deps)
self.age = age
We can break this up into separate files (in the same folder)
File tax1.py
class Tax(object):
def __init__(self, inc, deps):
self.income = inc
self.dependents = deps
def state_tax(self):
return 0.07 * self.income if self.dependents > 3 else 0.1 * self.income
def county_tax(self):
return 0.03 * self.income if self.dependents > 3 else 0.05 * self.income
File ptax.py
from tax1 import Tax
class PoorTax(Tax):
def county_tax(self):
return 0.02 * self.income
File rtax.py
import tax1
class RichTax(tax1.Tax):
def __init__(self, inc, deps, age):
self.income = inc
self.dependents = deps
self.age = age
def state_tax(self):
return 1.3 * tax1.Tax.state_tax(self) * (1.3 if self.age < 50 else 1.0)
Sorting Sequences of Objects Recall that a common pattern is to sort complex objects using only their
values at a given index
For objects, we use named attributes
>>> class Student:
... def __init__(self, name, grade, age):
... self.name = name
... self.grade = grade
... self.age = age
... def __repr__(self):
... return repr((self.name, self.grade, self.age))
...
>>> student_objects = [Student('john', 'A', 15),\
Student('jane', 'B', 12), Student('dave', 'B', 10)]
>>> sorted(student_objects, key=lambda student: student.age)
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
Recall that the key-function patterns for sorting are very common
Module operator provides convenience functions to make accessor functions easier and faster
Recall that operator.itemgetter(index) returns a callable object that fetches the value at index index of its operand
We used this to sort a list of tuples
And operator.itemgetter(key) also works with dictionaries
We used this to sort a list of dictionaries on key ‘age’
Now, operator.attrgetter(attr) returns a callable object that fetches the value of attribute attr from its operand (an object)—e.g.,
>>> import operator as op
>>> st = Student('al', 'A', 39)
>>> op.attrgetter('age')(st)
39
Use this to sort the above list of objects
>>> sorted(student_objects, key=op.attrgetter('age'))
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
The operator module functions allow multiple levels of sorting
This holds for attrgetter() as well
>>> sorted(student_objects, key=op.attrgetter('grade', 'age'))
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
Iterators Most container objects can be looped over using a for statement
Behind the scenes, the for statement calls iter() on the container object
iter() returns an iterator object
Defines the method next()
Accesses container elements 1 by 1
When elements are exhausted Raises a StopIteration exception Terminates the for loop
>>> a = [1, 2]
>>> it = iter(a)
>>> it
<listiterator object at 0x00D02630>
>>> it.next()
1
>>> it.next()
2
>>> it.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
To add iterator behavior to a classes,
define a __iter__() method returning an object with a next() method
If the class defines next(), then __iter__() can return self
class Reverse:
"Iterator for looping over a sequence backwards"
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def next(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
>>> from reverse import Reverse
>>> for x in Reverse([1, 2]):
... print x,
...
2 1
Generators Generators are a powerful tool for creating iterators
Written like a normal function
But use the yield statement when it returns data
Each time next() called, generator resumes where it left off
def reverse(data):
for index in range(len(data)-1, -1, -1):
yield data[index]
>>> from reverse1 import *
>>> for x in reverse([1, 2]):
... print x,
...
2 1
__iter__() and next() created automatically
Local variables & execution state automatically saved between calls
When a generator terminates, automatically raises StopIteration
Generator Expressions Some simple generators can be coded succinctly as expressions
Syntax like list comprehensions, but parentheses, not brackets
For where generator is used right away by an enclosing function
>>> (i*i for i in range(10))
<generator object at 0x00CE9AA8>
>>> g = (i*i for i in range(10))
>>> sum(g)
285
>>> xvec, yvec = [10, 20, 30], [7, 5, 3]
>>> sum(x*y for x, y in zip(xvec, yvec))
260
>>> line = 'the cat'
>>> list(line[i] for i in range(len(line)))
['t', 'h', 'e', ' ', 'c', 'a', 't']
Modules A script is a file of commands to be executed by the interpreter
A module is a file with definitions and statements used in
a script or
an interactive instance of the interpreter
The collection of identifiers accessible in a script or in interactive mode is the main module
Definitions in a module can be imported into
the main module and
other modules
The module’s name is the file name without extension .py
Available in a module as value of global variable __name__
Example module named fibo in file fibo.py
# Fibonacci numbers module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1
while b < n:
print b,
a, b = b, a+b
def fib2(n): # return Fibonacci series up to n
result = []
a, b = 0, 1
while b < n:
result.append(b)
a, b = b, a+b
return result
How Python Locates Modules Modules are searched in the list of directories in variable sys.path
>>> import sys
>>> sys.path
['',
'C:\\WINDOWS\\system32\\python27.zip',
'C:\\Python27\\DLLs',
'C:\\Python27\\lib',
'C:\\Python27\\lib\\plat-win',
'C:\\Python27\\lib\\lib-tk',
'C:\\Python27',
'C:\\Python27\\lib\\site-packages']
Put fibo.py in one of these folders
C:\Python27 is the interactive interpreter’s “home”
C:\Python27\lib\site-packages is where 3rd-party distributions are placed
Or append the name of the folder (a string with escaped backslashes) to sys.path
>>> sys.path.append('E:\\Old D Drive\\c690f08')
Or put it in some folder and change to that folder—e.g.,
>>> from os import *
>>> chdir("E:\old d drive\c690f08")
Then
>>> import fibo
Python comes with a library of standard modules
See http://docs.python.org/modindex.html
Some modules are built into the interpreter
The folder containing the script being run is on the search path
So, if the script had the same name as a standard module we’re trying to import, Python would try to load the script as a module instead
Importing Modules Importing the module enters its name in the current symbol table
But doesn’t enter names of the functions and classes defined in the module
To invoke such a function, use dot notation: module.function
>>> fibo.fib(50)
1 1 2 3 5 8 13 21 34
For brevity, assign the function to a local name
>>> fib = fibo.fib
>>> fib(50)
1 1 2 3 5 8 13 21 34
Executing Modules as Scripts When you run a Python module as a script with
C:\Some Folder> fibo.py arguments
the code in the module will be executed, just as if you imported it
But __name__ is set to “__main__”
So make the file usable as a script as well as importable as a module by adding at the end, e.g.,
if __name__ == "__main__":
import sys
fib(int(sys.argv[1]))
The sys.argv Variable The script name and optional following arguments are passed to the
script in variable sys.argv
List of strings
Length is at least 1
When no script and no arguments are given (in the Python command line),
sys.argv[0] is just an empty string
>>> sys.argv
['']
When -m module is used, sys.argv[0] is set to the full name of the located module.
E.g., file C:\Python27\args.py
def out_args():
import sys
print sys.argv
if __name__ == "__main__":
out_args()
C:\>python -m args 1 2 3
['C:\\Python27\\args.py', '1', '2', '3']
Get the same if we just execute args.py as a script
Let Windows figure out we need the Python interpreter from the .py extension
C:\>args.py 1 2 3
['C:\\Python27\\args.py', '1', '2', '3']
In last 2 example runs, the interpreter used the paths in the sys.path variable to locate args.py (in C:\Pyhton27)
It can’t use this info if we invoke it directly as the operand of the python command
Looks only in the current folder
C:\>python args.py 1 2 3python: can't open file 'args.py': [Errno 2] No such file or directory
C:\>cd python27
C:\Python27>python args.py 1 2 3
['args.py', '1', '2', '3']
Executing Modules as Scripts (Resumed) Recall that we added the following to the end of fibo.py
if __name__ == "__main__":
import sys
fib(int(sys.argv[1]))
Executing this module as a script:
C:\Python25>python fibo.py 50
1 1 2 3 5 8 13 21 34
If the module is imported, the code isn’t run
>>> import fibo
>>>
Used for a convenient user interface to a module
Or for testing Running the module as a script executes a test suite
“Compiled” Python files A speed-up for starting short programs using many standard modules
If foo.pyc exists in the folder where module foo.py is found,
assumed to contain a version of module foo already “byte-compiled”
Modification time of the version of foo.py used to create foo.pyc is recorded in foo.pyc
.pyc is regenerated if modification times don't match
When foo.py is compiled, compiled version automatically written to foo.pyc
Contents of .pyc files are platform independent
A Python module directory can be shared across architectures
When the Python interpreter is invoked with the –O flag,
optimized code is generated and stored in .pyo files
Optimizer currently doesn't help much
A program doesn't run any faster when read from a .pyc or .pyo file than when read from a .py file
Only thing that's faster is load time
When a script is run by giving its name on the command line,
the bytecode isn’t written to a .pyc or .pyo file
Can have a file foo.pyc (or foo.pyo) without a file foo.py for the same module
Used to distribute a library of Python code
Moderately hard to reverse engineer
Packages Packages let us structure Python's module namespace using
“dotted module names”
E.g., A.B denotes submodule B in package A
Dotted module names let authors of a multi-module package not worry about each other’s module names
As modules let authors of different modules not worry about each other’s global variable names
Examples We design a package (collection of modules) to handle uniformly
sound files and sound data
There are many sound file formats
Growing collection of modules to convert between file formats
There are also many operations on sound data
Growing collection of modules for these operations
Next slide: possible structure for our package as a hierarchical file system
sound/ Top-level package
__init__.py Initialize the sound package
formats/ Subpackage for file format conversions
__init__.py
wavread.py
wavwrite.py
auread.py
auwrite.py
...
effects/ Subpackage for sound effects
__init__.py
echo.py
surround.py
reverse.py
...
filters/ Subpackage for filters
__init__.py
equalizer.py
vocoder.py
...
Importing import statements are executed in 2 steps
1. find a module, and initialize it if necessary
2. define a name or names in the local namespace (of the scope where the import statement occurs)
“Initialize'” a built-in or extension module: call an initialization function the module provides
“Initialize” a Python-coded module: execute the module's body
Module searching generally involves searching the current folder for a module with the given name
If this fails, search containing package (if there’s one—see below)
If this fails, search a list of locations given in sys.path
__init__.py files are required to make Python treat the folders as containing packages
So a folder with a common name doesn’t hide valid modules later in the module search path
__init__.py could be empty
Can import individual modules from a package
import sound.effects.echo
Then sound.effects.echo.echofilter is referenced with its full name
sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)
An alternative way to import the submodule
from sound.effects import echo
Makes submodule echo available without its package prefix
echo.echofilter(input, output, delay=0.7, atten=4)
Yet another variation: import the desired function or variable directly
from sound.effects.echo import echofilter
Makes function echofilter() of submodule echo directly available
echofilter(input, output, delay=0.7, atten=4)
In the form
from package import item
item can be a submodule (or subpackage) of the package or some other name defined in the package (e.g., a function, class,
variable)
The import statement 1st tests if the item is defined in the package If not, it assumes it’s a module and attempts to load it If it fails to find it, an ImportError exception is raised
In the form
import item.subitem.subsubitem
each item but the last must be a package
The last can be a module or a package Can't be a class, function, or variable defined in the previous item
Note the forms
import mod1 as amod1
from mod import id1 as aid1
“as” causes the object to be bound to the alias in the namespace
E.g.,
import sound.effects.echo as echo_mod
from sound.effects import echo as echo_mod
from sound.effects.echo import echofilter as efilter
All import forms allow multiple items to be imported
Comma-separated list
import sound.effects.echo, sound.filters.equalizer
from sound.effects import echo, surround
from sound.effects.echo import echofilter, echoamp
from sound.effects import echo as ech, surround as sur
The form of import without from binds the module name in the local namespace to the module object
When a submodule of a package is loaded, Python ensures that the package itself is loaded first
The from form doesn’t bind the module name
Importing * from a Package The forms of import with from may have * in place of the (list of)
identifier(s) at the end
Used only in a module scope, not in function or class definitions
When Python executes
from sound.effects import *
we’d expect it to consult the filesystem to find which submodules are present
in the package and to import them all
But doesn’t work well on Windows (case) and some other platforms
The package author provides an explicit index of the package
Importing * from a module or package is frowned upon
Can cause hard-to-read code
But OK to save typing in interactive sessions
And certain modules are designed to export only names that follow certain patterns
The recommended notation is
from package import specific_submodule
unless the importing module needs to use submodules with the same name from different packages
Intra-package References A package’s submodules often must refer to each other
E.g., the surround module might use the echo module
import 1st looks in the containing package before looking in the standard module search path
So, the surround module can use just
import echo
or
from echo import echofilter