web science python notes albert esterline spring 2015

Web SciencePython Notes

Albert Esterline

Spring 2015

Installation and Program Execution Usually install Python in

C:\Python27

The executable, python.exe, is there So add this to your PATH environment variable

Invoke the interactive interpreter

by clicking the Python item in the pop-up menu you get by clicking Python2.7 in the All Programs menu accessed from the Start menu

by typing python in the command window

Python program files have extension py

Can write them with your favorite text editor

In Windows, no need to start program with any special directives E.g., no need for UNIX’s “#!”

To execute a program from the command window, go to its folder and type its name

Can also type python followed by its name

On the command line, use < to redirect input, > to redirect output

In interactive mode, prompts for next command with the primary prompt, “>>>”

For continuation lines, prompts with the secondary prompt, “...”

Continuation lines are needed when entering a multi-line construct—e.g.,

>>> the_world_is_flat = 1

>>> if the_world_is_flat:

... print "Be careful not to fall off!“ # Initial space needed

...

Be careful not to fall off!

Quick IntroNumbers Newline, not “;”, terminates a statement

Escape (\) the newline if necessary

Arithmetic is as in most languages

= for assignment

>>> width = 20

>>> height = 5*9

>>> width * height

900

A value can be assigned to several variables simultaneously—e.g.,

>>> x = y = z = 0

Mixed-type operands convert integer operand to floating point

In interactive mode, last printed expression is assigned to the

(effectively read-only) variable _

>>> tax = 12.5 / 100

>>> price = 100.50

>>> price * tax

12.5625

>>> price + _

113.0625

>>> round(_, 2)

113.06

Strings Can be enclosed in single quotes or double quotes:

>>> 'spam eggs'

'spam eggs'

>>> 'doesn\'t'

"doesn't"

>>> "doesn't"

"doesn't"

>>> '"Yes," he said.'

'"Yes," he said.'

>>> "\"Yes,\" he said."

'"Yes," he said.'

>>> '"Isn\'t," she said.'

'"Isn\'t," she said.'

String literals can span multiple lines in several ways

Continuation lines can be used:

>>> hello = "This is a rather long string containing\n\

... several lines of text just as you would do in C.\n\

... Note that whitespace at the beginning of the line is\

... significant."

>>> print hello

This is a rather long string containing

several lines of text just as you would do in C.

Note that whitespace at the beginning of the line is significant.

>>>

Strings can be surrounded by a pair of matching triple-quotes, """ or '''

End of lines needn’t be escaped but will be included in the string

>>> print """

... Usage: thingy [OPTIONS]

... -h Display this usage message

... -H hostname Hostname to connect to

... """

Usage: thingy [OPTIONS]

-h Display this usage message

-H hostname Hostname to connect to

>>>

Interpreter prints result of string operations just as they are typed for input

Concatenation is +, repetition *

>>> ('Help' + '! ') * 3

'Help! Help! Help! '

>>>

String literals next to each other are concatenated

Doesn’t work for arbitrary string expressions

>>> 'Help' '!\n'

'Help!\n'

>>> ('Help' * 3) '!\n'

File "<stdin>", line 1

('Help' * 3) '!\n'

^

SyntaxError: invalid syntax

>>>

+ is overloaded: addition and concatenation

Python doesn’t coerce mixed-type operands of +

>>> 3 + "4" An error

Use conversion functions

>>> str(3) + "4"

'34'

>>> 3 + int("4")

7

>>>

Strings can be indexed

Indices start at 0 (leftmost character)

No character type

A character is a length-one string

>>> "cat"[1]

'a'

>>>

Substrings can be specified with slice notation

str[low:hi] is the substring of str from index low to index hi-1

>>> "scatter"[1:4]

'cat'

>>>

Default for the low index is 0

For the high index, it’s the length of the sliced string

A slice index > the string’s length is replaced by that length

An out-of-range single-element index gives an error

If the low index is > the high index, the slice is empty

>>> wd = 'scatter'

>>> wd[:4]

'scat'

>>> wd[4:]

'ter'

>>> wd[4:10]

'ter'

>>> wd[2:1]

''

>>>

Can’t change a string by assigning to an indexed position or a slice—it’s immutable

For a negative index, count from the right

>>> wd[-1]

'r'

>>> wd[-2]

'e'

>>> wd[:-2]

'scatt'

>>> wd[-2:]

'er'

>>> wd[:-2] + wd[-2:]

'scatter'

>>>

Invariant: str[:i] + str[i:] is the same as str

An out-of-range negative slice index is truncated

Think of slice indices as pointing between characters

Left edge of the 1st character is numbered 0

Index of the right edge of the last character is the string’s length

c a t

0 1 2 3-3 -2 -1

Built-in function len() returns the length of its string argument

>>> len(wd)

7

Lists Several compound data types

Most versatile is the list:

comma-separated (possibly heterogeneous) items within […]

>>> a = [5, 10, 'cat', 'dog']

>>>

Lists can be indexed, sliced, concatenated, repeated

>>> a[-2]

'cat'

>>> a[:-2] + [15, 20]

[5, 10, 15, 20]

>>> 3*a[:2] + [60]

[5, 10, 5, 10, 5, 10, 60]

>>>

Unlike strings, lists are mutable

Can assign to individual elements>>> a[1] = 12

>>> a

[5, 12, 'cat', 'dog']

>>>

Can assign to slices

Replace items

>>> a[1:3] = [15,'bird']

>>> a

[5, 15, 'bird', 'dog']

>>>

Remove items

>>> a[1:3] = []

>>> a

[5, 'dog']

>>>

Insert items

>>> a[1:1] = [20, 'fish']

>>> a

[5, 20, 'fish', 'dog']

>>>

Insert a copy of itself at the beginning

>>> a[:0] = a

>>> a

[5, 20, 'fish', 'dog', 5, 20, 'fish', 'dog']

>>>

Replace all items with an empty list

>>> a[:] = []

>>> a

[]

>>>

len() also applies to a list:

>>> len([1, 2, 3])

3

>>>

Append a new item to the end

>>> a = [1, 2]

>>> a.append(3)

>>> a

[1, 2, 3]

>>>

Lists can be nested

>>> q = [2, 3]

>>> p = [1, q, 4]

>>> len(p)

3

>>> p[1][0] = 5

>>> q

[5, 3]

>>>

First Steps towards Programming print statement takes 1 or more comma-separated expressions

Writes their values separated by spaces (but no commas) Strings written without quotes

Multiple assignment Comma separated list of n > 1 variables on LHS, n expressions on RHS

Values of expressions simultaneously assigned to corresponding variables

>>> x, y = 1, 2

>>> print x, y

1 2

>>>

Swap values

>>> x, y = y, x

>>> print x, y

2 1

>>>

Any non-0 integer value is true and 0 is false (like C++/Java)

Type bool has 2 objects, True & False

Comparison operators as in C++/Java:

<, >, ==, <=, >=, !=

Can combine in familiar ways giving ternary relations—e.g.,

2 <= x <= 4

Logical operators: not, and, or (last 2 are shortcut)

Control statement ends with a ‘:’

(…) not needed around condition

Statements in its scope indented (no brackets)

Example: initial part of Fibonacci series>>> a, b = 0, 1

>>> while b < 10:

... print b,

... a, b = b, a+b

...

1 1 2 3 5 8

>>>

A trailing comma avoids outputting a newline

Lines in the scope of while must be explicitly tabbed or spaced in

Give a blank line telling interpreter we’re at the end of the loop

Do this in a file: E:\Old D Drive\c690f08\while.py

a, b = 0, 1

while b < 10:

print b,

a, b = b, a+b

Suggestion: Get Notepad++

Under the tab Language, select Python

Execution

E:\Old D Drive\c690f08>while.py

1 1 2 3 5 8

For simple input, use raw_input()

Optional string argument for a prompt

Example programx = int(raw_input("Enter an integer: "))

y = int(raw_input("Enter another integer: "))

print "The sum is ", x + y

Execution

E:\Old D Drive\c690f08>input.py

Enter an integer: 3

Enter another integer: 5

The sum is 8

E:\Old D Drive\c690f08>

Control Flowif-elif-else 0 or more elif clauses and an optional else—e.g.,

x = int(raw_input("An integer: "))

if x < 0:

y = -1

print "Negative,",

elif x == 0:

y = 0

print "Zero,",

else:

y = 1

print "Positive,",

print "y is ", y

Use if … elif … elif … in place of a switch statement

while condition: See above

break, continue, pass As in C++/Java,

break breaks out of the smallest enclosing for or while loop

continue continues with the next iteration of the loop

pass is the do-nothing statement (placeholder)

range() function range(n) returns a list from 0 to n -1 (increments of 1)

range(m, n), n > m, returns a list from m to n -1 (increments of 1)

range(m, n, inc) returns a list starting at m with increments of inc up to but not including n

If inc > 0, then n > m, otherwise n < m

for variable in sequence: On successive iterations, variable bound to successive elements in

sequence

sequence can be a string or list

>>> for x in "abc":

... print x,

...

a b c

>>> for x in range(3):

... print x,

...

0 1 2

>>> for x in range(10, 2, -2):

... print x,

...

10 8 6 4

>>>

Loop statements may have an else clause

Executed when the loop terminates by exhausting the list (for) or the condition becomes false (while)

But not when the loop is terminated by a break

E.g., check the integers 2-9 for primes

for n in range(2, 10):

for x in range(2, n):

if n % x == 0:

print n, 'equals', x, '*', n/x

break

else:

# loop fell through without finding a factor

print n, 'is a prime number'

Execution

E:\Old D Drive\c690f08>primes.py

2 is a prime number

3 is a prime number

4 equals 2 * 2

5 is a prime number

6 equals 2 * 3

7 is a prime number

8 equals 2 * 4

9 equals 3 * 3

E:\Old D Drive\c690f08>

Conditional Expressions Python allows expressions of the form

expr1 if cond else expr2

where

expr1 and expr2 are arbitrary expressions and

cond evaluates to True or False (or values that can be interpreted as True or False)

If cond is True, value of the entire expression is the value of expr1

If cond is False, the expression’s value is the value of expr2

E.g.,

>>> x = 5

>>> 2 if x > 5 else 4

4

Typically assign the value of a conditional expression to something (e.g., a variable)

E.g., the following sets x to 0 if its below the threshold and to twice its value if it’s above

>>> threshold = 10

>>> x = 11

>>> x = 0 if x < threshold else 2 * x

>>> x

22

FunctionsFunction definition

def name(formalParameterList):

body

formalParameterList is a comma-separated list of identifiers

No type or passing mechanism info

body is indented

Optionally starts with a document string (docstring), a string literal Some tools use docstrings to produce documentation Put docstrings at various places in your code

Function callname(actualParameterList)

actualParameterList is a comma-separated list of expressions

Example programdef fib(n): # write Fibonacci series up to n

"""Print a Fibonacci series up to n."""

a, b = 0, 1

while b < n:

print b,

a, b = b, a+b

# Now call the function we just defined:

fib(2000)

Function execution introduces a symbol table for local variables

All variable assignments in the function store the value in the local symbol table

But a variable reference looks first in the local symbol table then in the global symbol table then in the table of built-in names

So global variables may be referenced in a function but not assigned to—

unless named in a global statement—e.g.,

global x, y

Actual parameters to a function call are introduced in the local symbol table when the function is called

So arguments are passed using call by value

But the value is an object reference (for non-scalars)

A function definition introduces the function name in the current symbol table

The value of the function name has a type for a user-defined function

>>> def foo(n):

... print n

...

>>> foo(3)

3

>>> foo

<function foo at 0x00AA5EB0>

This value can be assigned to another name

That name can then be used as a function—function renaming

>>> bar = foo

>>> bar(3)

3

Use a return statement to return control to the caller

If return has an operand, its value is returned as the value of the call

The return type of a function with no return operand is None

>>> print foo(2)

2

None

>>> def trivial(n):

... return n

...

>>> print trivial(2)

2

>>> trivial(2)

2

Example program Rewrite the Fibonacci function so it returns a list

def fib2(n): # return Fibonacci series up to n

"""Return a list containing the Fibonacci series up to n."""

result = []

a, b = 0, 1

while b < n:

result.append(b)

a, b = b, a+b

return result

f100 = fib2(100) # call it

print f100 # write the result

Note: append() is a list method

Function definitions may be nesteddef aveSqr(m, n):

def ave(x, y):

return (x + y) / 2

val = ave(m, n)

return val * val

print aveSqr(3, 5)

Function definitions may be recursivedef fact(n):

if n == 0:

return 1

else:

return n * fact(n-1)

print fact(4)

Default Argument Values Call a function with fewer arguments than it is defined with

Formal parameters assigned default values in the parameter list

If a parameter has a default, all parameters to its right in the list must also have defaults

In a call, if a value is supplied for a default parameter, values must be supplied for all default parameters to its left

def sum3(x, y=2, z=3):

return x + y + z

print sum3(1), " ", sum3(1, 1), " ", sum3(1, 1, 1)

Outputs 6, 5, 3

Keyword Arguments Can call a function with keyword arguments of the form keyword=value

All positional (normal) arguments must occur before any keyword arguments

Associated by position with formal parameters

Keyword arguments can occur in any order

May or may not have defaults

def classInfo(name, instructor='Dr. Smith', TA='Fred'):

print name, 'is taught by ', instructor, " helped by ", TA

classInfo('cs333')

classInfo(TA="Igor", instructor='Dr.Doom', name='cs555')

Output

cs333 is taught by Dr. Smith helped by Fred

cs555 is taught by Dr.Doom helped by Igor

A dictionary (see below) in other languages is called a hash or associative array

A final formal parameter of form **name receives a dictionary with all keyword arguments except those corresponding to formal parameters

def classInfo1(name, **keywords):

print "Class: ", name

keys = keywords.keys()

for kw in keys:

print kw, ': ', keywords[kw]

classInfo1('cs666', instructor='Dr. Jones', TA='Al', student='Jim')

Output

Class: cs666

instructor : Dr. Jones

student : Jim

TA : Al

Arbitrary Argument Lists Specify that a function can be called with arbitrary number of arguments

Final formal parameter of form *name

Binds name to a tuple (a kind of sequence—see below) of all argument values after those corresponding to formal parameters

def arbNum(label, *nums):

print label

for x in nums: print x,

arbNum('Cars per day', 1, 4, 2, 6)

Output

Cars per day

1 4 2 6

Unpacking Argument Lists Arguments already in a sequence but must be unpacked

Use the * operator

>>> args = [3, 6]

>>> range(*args)

[3, 4, 5]

A dictionary can be unpacked to give keyword arguments using the ** operator

def foo(x, y=1, z=2):

return x + y + z

dict = {"z": 5, "y": 10}

print foo(15, **dict)

Outputs 30

Lambda Forms Small anonymous functions

Used wherever a function object is needed

lambda argList: expression

argList is a comma separated list of arguments

The value of expression is returned

def make_contains(pt):

return lambda low, hi: low <= pt <= hi

target = make_contains(5)

print target(4, 6), target(6, 8)

Outputs True False

Function Documentation Just after the first line of a function’s definition, put its docstring

First line is a concise description not including the function’s name

Then a blank line

Then one or more paragraphs describing the function’s calling conventions, side effects, etc.

The first non-blank line after the first line determines the indentation for the entire documentation

A function’s docstring is the value of its __doc__ property

def my_function():

"""Do nothing, but document it.

No, really, it doesn't do anything.

"""

pass

print my_function.__doc__

Execution

E:\Old D Drive\c690f08> doc.py

Do nothing, but document it.

No, really, it doesn't do anything.

Data StructuresMore on ListsList Methods Let L be a list, x an item (that could be in a list), i an index

Results of 1st 4 methods below can be done alternatively with slices

All but index() and count() modify the list itself

All but pop(), index(), and count() return None

append(x ): Add x to the end of the list

extend(L ): Extend the list by appending all the items in list L

insert(i, x ): Insert x at position in front of the item at index i

remove(x ): Remove 1st item x from the list—error if no such item

pop(): Remove and return the last item in the list

pop(i ): Remove and return the item at index i

index(x ): Return index of 1st tem with value x—error if none

count(x ): Return the number of times x appears in the list

sort(): Sort the list, in place

reverse(): Reverse the list, in place

>>> L = [2,7,3,9,5]

>>> L.append(6)

>>> L

[2, 7, 3, 9, 5, 6]

>>> L.sort()

>>> L

[2, 3, 5, 6, 7, 9]

Use append(x) and pop() to implement a stack

Use append(x) and pop(0) to implement a queue

Functional Programming Tools filter(function, sequence) returns the sequence of each

item from sequence for which function(item) is true

>>> def in_range(x): return 5 <= x <= 10

...

>>> filter(in_range, [1,14, 6, 9, 4, 7, 12])

[6, 9, 7]

If the function is used only once, no need to give it a name

Use a lambda form

>>> filter(lambda x: 5 <= x <= 10, [1,14, 6, 9, 4, 7, 12])

[6, 9, 7]

map(function, sequence) calls function(item) for each item in sequence

Returns a list of the return values

>>> map(lambda x: x*x, range(6))

[0, 1, 4, 9, 16, 25]

If more than one sequence is passed

function must have as many arguments as there are sequences

the sequences must have the same length

function is called with the corresponding items from each sequence

>>> map(lambda x,y: x-y, [2,4,8,16], [1,2,3,4])

[1, 2, 5, 12]

reduce(function, sequence), where function is binary,

calls function on the 1st 2 items of sequence

then on the result and the next item

etc.

>>> reduce(lambda x,y: x+y, range(1, 11))

55

If sequence has just 1 item, it’s returned

If it’s empty, there’s an error

If a 3rd argument is passed to reduce, it’s the starting value

Then an empty sequence doesn’t give an error The 3rd argument is returned

>>> def my_sum(seq):

... return reduce(lambda x,y: x+y, seq, 0)

...

>>> my_sum(range(1, 11))

55

In fact, there’s a built-in function sum() that does this

>>> sum(range(1, 11))

55

List Comprehensions A concise way to create lists without using map(), filter()

Simple form

[ expression for variable in sequence ]

Forms a list of values of expression for successive items in sequence

>>> [x*x for x in range(11)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Add an if clause to restrict which items in sequence participate

>>> [x*x for x in range(11) if x % 2 == 0]

[0, 4, 16, 36, 64, 100]

Possibly several for clauses, each possibly with an if clause

expression contains the variable from each for clause

for clauses separated only by whitespace

With 2 for clauses: Each value of the leftmost for variable is paired in turn with

successive values of the variable in the for clause to its right Nested loop

Generalization to n for clauses obvious

>>> [x+y for x in [5, 10] for y in range(1, 11) if y % 2 == 0]

[7, 9, 11, 13, 15, 12, 14, 16, 18, 20]

Tuples Tuple is another sequence data type (besides string and list)

Comma-separated list of values enclose in (…)

Tuples can be indexed, sliced, concatenated, repeated

>>> t = (1, 2, 3)

>>> t

(1, 2, 3)

>>> t[0]

1

>>> t[:2]

(1, 2)

>>> t + (4, 5, 6)

(1, 2, 3, 4, 5, 6)

>>> t * 3

(1, 2, 3, 1, 2, 3, 1, 2, 3)

But tuples are immutable

Tuples can be nested and have lengths

>>> u = ((1, 3), (2, 4))

>>> u[0][1]

3

>>> len(u)

2

Because of tuple packing, can omit the (…) around tuple values on the RHS of an assignment

>>> t = 1, 2, 3

>>> t

(1, 2, 3)

To pack a singleton tuple, follow the value with a comma (else we have a normal assignment)

Singleton tuples are always displayed with a comma after the value

>>> s = 12,

>>> s

(12,)

>>> len(s)

1

Because of sequence unpacking, can assign a tuple (or other sequence) of n elements to n variables (separated by commas)

>>> x, y, z = t

>>> print x, y, z

1 2 3

Lists and strings can also be unpacked

>>> x, y, z = [4, 5, 6]

>>> print x, y, z

4 5 6

>>> x, y, z = "cat"

>>> print x, y, z

c a t

But only tuples can be packed

Multiple assignment is just tuple packing with sequence unpacking

Sets A set is an unordered collection with no duplicates

Form by applying constructor set() to a sequence

Displayed as set() applied to a list (no matter how formed) Display sorts the elements

>>> set([2, 3, 1])

set([1, 2, 3])

>>> set("cat")

set(['a', 'c', 't'])

Or enclose the elements within {…}, separated by commas

>>> {2,3,1}

set([1, 2, 3])

Sets, being unordered, don’t support indexing or slicing

Subset relation, <=

>>> {1,3} <= {1,2,3}

True

Sets are equal iff they have the same members (regardles of order)

>>> {1,3,2} == {1,2,3}

True

Use in for membership

>>> 2 in {1,2,3}

True

The negation of in is not in

>>> 4 not in {1,2,3}

True

Sets are mutable

Methods add() and remove()

>>> s = {1,2,3}

>>> s.add(4)

>>> s.remove(2)

>>> s

set([1, 3, 4])

Eliminate duplicates in a list (possibly reordering elements)

>>> list(set([1,2,3,2,1,0]))

[0, 1, 2, 3]

Set operators

union (|)

intersection (&)

difference (-)

symmetric set difference (^, analogous to XOR)

>>> s1, s2 = {1,2}, {2,3}

>>> s1 | s2

set([1, 2, 3])

>>> s1 & s2

set([2])

>>> s1 - s2

set([1])

>>> s1 ^ s2

set([1, 3])

Dictionaries “Hashes” or “associative arrays”

Indexed by keys (i.e., “keyed”)

Key must be immutable Strings and numbers OK A tuple is OK if its elements are immutable Lists never OK

A dictionary literal is a comma-separated list of key:value pairs (associations) within {…}

This is how dictionaries are displayed

Extracting a value using a non-existent key is an error

But add a pair by assigning to the keyed element Can’t extend a list in this way

>>> age = {'bob': 21, 'al': 34}

>>> age['ed'] = 28

>>> age

{'ed': 28, 'bob': 21, 'al': 34}

Delete a pair with del applied to a keyed element

>>> del age['al']

>>> age

{'ed': 28, 'bob': 21}

The keys() dictionary method returns a list of all its keys

Sort this list in place with the sort() list method

>>> names = age.keys()

>>> names.sort()

>>> names

['bob', 'ed']

To check whether a key is in the dictionary, use the has_key(key) method or the in keyword

>>> age.has_key('bob')

True

>>> 'al' in age

False

dict() builds a dictionary from a list of pairs stored as tuples

>>> weight = dict([('bob', 175), ('ed', 250)])

>>> weight

{'ed': 250, 'bob': 175}

Use the form with keyword arguments when keys are strings

>>> height = dict(bob=6.1, ed=5.5)

>>> height

{'ed': 5.5, 'bob': 6.0999999999999996}

List comprehension is possible when the pairs form a pattern

>>> dict([(x, x**2) for x in range(1,5)])

{1: 1, 2: 4, 3: 9, 4: 16}

Looping Techniques Looping through dictionaries, retrieve the key and associated value

together using the iteritems() method.

>>> for p, w in weight.iteritems():

... print p, w

...

ed 250

bob 175

Looping over a sequence, retrieve the index and corresponding value together using the enumerate() function

>>> for i, p in enumerate(['bob', 'ed']):

... print i, p

...

0 bob

1 ed

To loop over multiple sequences together, pair corresponding (by position) items with function zip()

>>> players = ['bob', 'ed', 'al']

>>> scores = [70, 50, 100]

>>> for p, s in zip(players, scores):

... print p, s

...

bob 70

ed 50

al 100

Function sorted(list) returns a sorted version of list without changing list

Recall: method list.sort() sorts list in place

Loop over a list in sorted order:

>>> for n in sorted([5, 1, 4, 2, 3]):

... print n,

...

1 2 3 4 5

Use function reversed() to reverse a list (and loop over it)

>>> for i in reversed(range(1,10,2)):

... print i,

...

9 7 5 3 1

There’s a method list.reverse() reversing list in place

Relations on Sequences We’ve seen operators in and not in with sets and (applied to keys)

dictionaries

Also used with sequences

>>> 2 in (1, 2, 3)

True

>>> 4 not in [1, 2, 3]

True

>>> 'a' in 'cat'

True

Sequence objects compared to other objects of same sequence type

Uses lexicographical ordering: Compare the 1st elements If a tie, compare the 2nd If still a tie, compare the 3rd Etc.

If one sequence is an initial sub-sequence of the other, shorter is earlier in the order

>>> (2, 3) > (2, 2)

True

>>> [1, 2, 3] < [1, 3, 3]

True

String comparison uses ASCII ordering for individual characters

Lowercase letters are in alphabetical order in ASCII as are uppercase letters

But all uppercase letters occur before any lowercase

>>> 'cat' < 'dog'

True

>>> 'cat' < 'Dog'

False

Infix operator is checks whether its operands reference the same object

== just checks that content is the same So p is q implies p == q but not vice versa

is not is the negation of is

>>> p = [1, 2, 3]

>>> q = p

>>> r = [1, 2, 3]

>>> p == r

True

>>> p is r

False

>>> p is not r

True

>>> p == q

True

>>> p is q

True

Here p and q reference the same object

So a change to p is ipso facto a change to q and vice versa

>>> p[1] = 4

>>> q

[1, 4, 3]

SortingSorting Lists of Dictionaries "the Old Way"Suppose we have a list of dictionaries such as

[{'key3': 5, 'key2': 1, 'key1': 4},

{'key3': 3, 'key2': 3, 'key1': 2},

{'key3': 2, 'key2': 2, 'key1': 5}]

Suppose this is assigned to variable undecorated

Sorting this list on the value of key2 should give

[{'key3': 5, 'key2': 1, 'key1': 4},

{'key3': 2, 'key2': 2, 'key1': 5},

{'key3': 3, 'key2': 3, 'key1': 2}]

Work with a list not of dictionaries but of 2-element tuples

A tuple contains the value associated with key2 in a given dictionary followed by that dictionary This list is said to be decorated

Method sort() for lists sorts the list of tuples on their 1st elements

The 2nd elements (the dictionaries) get carried along in the sort

The dictionaries end up in the intended order but as the 2nd elements in tuples

Use list comprehension to get the decorated list.

decorated = [ (dct["key2"], dct) for dct in undecorated ]

The value of decorated is now

[(1, {'key3': 5, 'key2': 1, 'key1': 4}),

(3, {'key3': 3, 'key2': 3, 'key1': 2}),

(2, {'key3': 2, 'key2': 2, 'key1': 5})]

Next,

decorated.sort()

so decorated becomes

[(1, {'key3': 5, 'key2': 1, 'key1': 4}),

(2, {'key3': 2, 'key2': 2, 'key1': 5}),

(3, {'key3': 3, 'key2': 3, 'key1': 2})]

Now extract the list of dictionaries from the list of tuples while maintaining their order

Again use list comprehension

[ dct for (key, dct) in decorated ]

The result is

[{'key3': 5, 'key2': 1, 'key1': 4},

{'key3': 2, 'key2': 2, 'key1': 5},

{'key3': 3, 'key2': 3, 'key1': 2}]

Sorting More Generally Function sorted() takes a list (or other sequence, coercing it to a list)

Returns the sorted version of the list (without modifying the original)

List method sort() applied to a list sorts the list in place

No such method for tuples or strings—immutable

Henceforth, discuss only function sorted()

All that’s said about sorted() also applies to method sort()

Sequences are sorted by lexicographic order—e.g.,

>>> sorted("example")

['a', 'e', 'e', 'l', 'm', 'p', 'x']

Sequences of sequences are sorted in lexicographic order—e.g.,

>>> sorted(["dog", "cat", "bird"])

['bird', 'cat', 'dog']

>>> sorted([(2,1), (2,2), (1,2)])

[(1, 2), (2, 1), (2, 2)]

Works on partial orders

Recall: For sets A and B, A < B is true if A is a proper subset of B, A <= B if A is a subset of B

>>> sorted([{1,2,3}, {2,3}, {2,1}])

[set([2, 3]), set([1, 2]), set([1, 2, 3])]

A reverse parameter with a boolean value (default False) lets us specify a sort in reverse order—e.g.,

>>> sorted([(2,1), (2,2), (1,2)], reverse=True)

[(2, 2), (2, 1), (1, 2)]

Parameter key lets us specify a function with a single argument that returns a key to use for sorting purposes

The function is applied once to each element

Elements are sorted as per the order of their values for this function

>>> sorted(["abc", "ef"], key=len)

['ef', 'abc']

lower is a method of string objects

If we invoke it on the type/class, must pass an instance of str (for the value of self)

>>> str.lower("AbC")

'abc'

Use this as a key for sorting

>>> sorted(["Dog", "cat", "Cattle"], key=str.lower)

['cat', 'Cattle', 'Dog']

A common pattern is to sort complex objects using only their values at a given index—e.g.,

>>> student_tuples = [('john', 'A', 15), ('jane', 'B', 12),\

('dave', 'B', 10)]

>>> sorted(student_tuples, key=lambda student: student[2])

[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

The same techniques works for dictionaries

>>> student_recs = [dict(name='john', grade='A', age=15),\

dict(name='jane', grade='B', age=12),\

dict(name='dave', grade='B', age=10)]

>>> sorted(student_recs, key=lambda student: student['age'])

[{'grade': 'B', 'age': 10, 'name': 'dave'},

{'grade': 'B', 'age': 12, 'name': 'jane'},

{'grade': 'A', 'age': 15, 'name': 'john'}]

And for objects with named attributes (see below)

Sorting with operator Module Functions The key-function patterns shown above are very common

Module operator provides convenience functions to make accessor functions easier and faster

operator.itemgetter(index) returns a callable object that fetches the value at index index of its operand—e.g.,

>>> import operator as op

>>> ls = [1,2,3]

>>> op.itemgetter(1)(ls)

2

Using this to sort the above list of tuples

>>> sorted(student_tuples, key=op.itemgetter(2))


operator.itemgetter(key) also works with dictionaries

>>> op.itemgetter('b')({'a':1, 'b':2, 'c':3})

2

Use this to sort the above list of dictionaries on key ‘age’

>>> sorted(student_recs, key=op.itemgetter('age'), reverse=True)

[{'grade': 'A', 'age': 15, 'name': 'john'},

{'grade': 'B', 'age': 12, 'name': 'jane'},

{'grade': 'B', , 'age': 15, 'name': 'dave'}]

operator.attrgetter(attr) returns a callable object that fetches the value of attribute attr from its operand (an object)

See below, after we've introduced classes

The operator module functions allow multiple levels of sorting

E.g., sort student_tuples with index 1 as the primary key and index 2 as the secondary key

I.e., sort on index 1 and break ties with index 2

>>> sorted(student_tuples, key=op.itemgetter(1,2))

[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

E.g., sort student_recs with 'grade' as the primary key and 'age' as the secondary key

>>> sorted(student_recs, key=op.itemgetter('grade', 'age'))

[{'grade': 'A', 'age': 15, 'name': 'john'},

{'grade': 'B', 'age': 10, 'name': 'dave'},

{'grade': 'B', 'age': 12, 'name': 'jane'}]

operator.methodcaller(name) returns a callable object that calls the method name on its operand—e.g.,

>>> op.methodcaller('pop')([1,2,3])

3

Parameter name may be followed (after a comma) by a comma-separated run of arguments to method name

The arguments become arguments of the callable object

>>> op.methodcaller('count',2)([1,2,3,2,4,2])

3

As a sophisticated example, we sort in reverse order a list of 4-tuples of integers on the number of -1’s in them

>>> marks = [(4,-1,5,3), (5,6,7,8), (5,-1,6,-1)]

>>> sorted(marks, key=op.methodcaller('count', -1),\ reverse=True)

[(5, -1, 6, -1), (4, -1, 5, 3), (5, 6, 7, 8)]

Output Formatting For output formatting, you can roll your own with string slicing and

concatenation

Or use the % (formatting or interpolation) operator with

a format string as the left operand and,

as the right operand, an expression or tuple of expressions whose values are substituted into the format string

Returns the string resulting from substituting the values from the right operand into specified positions in the left operand

This derives from formatting output in C

General form

format % values

Conversion specifiers in format are character sequences beginning with %

If there’s 1 conversion speicifer

Then values is a single expression whose value is converted to a string (as if by str()) and substituted for the conversion specifier to given the resulting string

If there are n > 1 conversion specifiers in format

Then values is a tuple of n expressions

Resulting string is produced by converting the values of the expressions to strings and substituting each for corresponding conversion specification in

format Correspondence by position

E.g., %s specifies a string and %d specific a decimal integer

>>> "The number is %d" % 5

'The number is 5'

>>> "%s is %d years old" % ('Fred', 21)

'Fred is 21 years old'

Normally, string interpolation is done for well-formatted output

>>> print "%s is %d years old" % ('Fred', 21)

Fred is 21 years old

A conversion specifier contains at least % followed by a conversion type

Some conversion typess String

c Single character (accepts integer or single character string)

d Signed integer decimal

f (or F) Floating point decimal

e (resp, E) Floating point exponential with lowercase (resp., uppercase) ‘e’

g (or G) Floating point decimal or exponential (depending on the magnitude of the number and the specified precision)

>>> "%f %e %g %g" % (2.5, 2.5, 250000, 2500000)

'2.500000 2.500000e+000 250000 2.5e+006'

Often want to control width of the field a value is substituted into

Give the min. field width between the % and the conversion type

If the value requires more space, it takes it (thus “min.”)

>>> "%5s %5d %3s" % ('cat', 12, 'elephant')

' cat 12 elephant'

By default, values are right justified

Without a min. field width specified, value occupies just as much as needed

For floating point values, specify a precision just after the minimum field width

A period (‘.’) followed by the number of places to show after the decimal point

The decimal places and decimal point contribute to the field width

>>> "%5.2f %5.1e" % (25.4, 25.6)

'25.40 2.6e+001'

Field width is important when values are output in tabular form

For free-form text, usually let the values occupy their own space (no min. field width specified)

But usually specify precision of floating-point values—default gives many more digits than desirable

To specify just the precision (and allow the overall floating-point value occupy just enough space),

use a conversion specifier with no min. field width but with a precision

Example

>>> x = 1.0/3

>>> "%.2f and %.2f" % (x, -2*x)

'0.33 and -0.67'

0 or more conversion flags may occur between % and the min. field width

Some conversion flags0 Conversion is 0 padded for numeric values

- Converted value is left adjusted (overrides "0" if both given)

(a space) A blank is left before a positive number produced by a signed conversion

+ A sign character ("+" or "-") precedes conversion (overrides a "space" flag)

>>> "%-5d %0+5.1f" % (12, 3.5)

'12 +03.5'

The New, Pythonic Way to Format The format string method was added in Python 2.6

General form

template.format(p0, p1, ..., k0=v0, k1=v1, ...)

The template (or format string) is a string with 1 or more format codes (fields to be replaced) embedded

The fields to be replaced are surrounded by {}

The curly braces and "code" inside are replaced with a formatted value from 1 of the arguments

Anything not contained in {} is literally printed

If a brace character, { or }, has to be printed, it is escaped by doubling it: {{ and }}

The list of arguments for format() starts with 0 or more positional arguments (p0, p1, ...)

followed by 0 or more keyword arguments, name=value

A positional parameter is accessed by placing the index of the parameter after the opening brace

{0} accesses the 1st parameter, {1} the 2nd, …

The index can be followed by a colon and a format string of the form used in the old formatting way—e.g., {0:5d}

If the positional parameters are used in the order in which they’re written and no formatting is specified,

positional argument specifiers inside the braces can be omitted

E.g., '{} {} {}' corresponds to '{0} {1} {2}'

They’re needed if they’re accessed in a different order—e.g., '{2} {1} {0}'—or formatting is specified

Examples>>> "Product: {0:6s}, Price per unit: ${1:5.2f}".format('Milk', 5.23)

'Product: Milk , Price per unit: $ 5.23'

>>> "Price per unit: ${1:5.2f}, Product: {0:6s}".format('Milk', 5.23)

'Price per unit: $ 5.23, Product: Milk '

>>> "Product: {p:6s}, Price per unit: ${u:5.2f}".format(p='Milk', u=5.23)

'Product: Milk , Price per unit: $ 5.23'

For justifying, precede the formatting with a "<" (left justify, usually the default) or ">" (right justify)

Use "^" to have the value centered in the available space

Unless a min field width is defined, the field width is the same size as the data to fill it—alignment isn’t an issue

"+" specifies including positive and negative signs for numbers

"-" specifies sign only for negative numbers (the default)

" " (space) specifies space for positive numbers, a minus sign for negative numbers

"=" (for numbers) specifies 0 padding before the digits

A dictionary can be unpacked (using **) to provide keyword arguments for format()

Example

>>> capital_country = {"US" : "Washington",

... "Germany": "Berlin",

... "France" : "Paris",

... "UK" : "London"}

>>> format_string = ''

>>> for c in capital_country:

... format_string += c + ": {" + c + "}; "

...

>>> format_string

'Germany: {Germany}; UK: {UK}; US: {US}; France: {France}; '

>>> format_string.format(**capital_country)

'Germany: Berlin; UK: London; US: Washington; France: Paris; '

File Methods To open a file and return a file object (as a handle to it), use

open(filename, mode)

filename is a string with the pathname of the file (relative to the current folder)

mode (a string) is one of

′r′ to open the file for reading

′w′ for writing (erasing a file with the same name if it exists)

′a′ for appending data to the end of the file

′r+′ for reading and writing On Windows systems, ′b′ appended to the mode opens the

file in binary mode (e.g., for images)

File method read() reads the entire file, returns its contents as a string

Optional integer argument indicates the max. number of bytes to read If EOF has been reached, the empty string is returned

File method close() closes the corresponding file

File data.txt1

2

3

4

Program file file1.py (in same folder as data.txt)

f = open('data.txt', 'r')

s = f.read()

print s

f.close()

Run

E:\Old D Drive\c690f08>file1.py

1

2

3

4

f.readline() reads a single line from the file, returns it as a string

A newline character (\n) is left at the end of the string

So the returned string doesn’t end with a \n only if it’s the last line in the file and that line doesn’t end with a \n

So a blank line is returned as ′\n′ while EOF returns an empty string

f.readlines() returns a list of all lines in the file

An optional integer parameter specifies how many bytes to read Enough more to complete the last line returned

Use this so an entire large file needn’t be loaded into memory


ls = f.readlines()

print ls

f.close()

G:\c690f08>file2.py

['1\n', '2\n', '3\n', '4']

Convert values to integers

Newline presents no problem

…

ls1 = [ int(x) for x in ls ]

print ls1

…

G:\c690f08>file3.py

[1, 2, 3, 4]

A fast, memory-efficient alternate way to read lines is to loop over the file object

The 2 approaches manage line buffering differently

Don’t mixed them


for val in f:

print val,

f.close()

G:\c690f08>file4.py

1

2

3

4

Separate lines since \n is retained

File Output f.write(string) writes string to the file

An output value must first be converted to a string

f = open('out.txt', 'w')

age = {'Al': 25, 'Ed': 34, 'Bob': 21, 'Ken': 37}

for n, a in age.iteritems():

f.write( "%s\t%3d\n" % (n, a) )

f.close()

File out.txt

Ed 34

Bob 21

Al 25

Ken 37 [Blank last line]

Exceptions Exception: error detected during execution

>>> 2/0

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ZeroDivisionError: integer division or modulo by zero

Last line states what happened

Starts with exception’s type (ZeroDivisionError, built-in) Always output with built-in exceptions Recommended for user-defined ones

Rest of the line depends on the exception type and cause

Preceding part of message is a stack traceback indicating source lines

Handling Exceptions Simple try statement

try:

Try clause

except Exception:

Except clause

When the try clause is executed,

if no exception occurs, except clause is skipped

if an exception occurs, rest of the try clause is skipped

If the exception’s type matches Exception,

except clause is executed and

execution continues after the try statement

If not, the exception is passed on to outer try statements

If no handler is found, it’s an unhandled exception

Execution stops with a message (see above)

while True:

try:

x = int(raw_input("Please enter a number: "))

break

except ValueError:

print "Invalid number, try again...“

E:\Old D Drive\c690f08>try1.py

Please enter a number: a

Invalid number, try again...

Please enter a number: 2

Can interrupt the program with Ctrl-C

E:\Old D Drive\c690f08>try1.pyPlease enter a number: Ctrl-C Traceback (most recent call last):

File "E:\Old D Drive\c690f08\try1.py", line 3, in <module>

x = int(raw_input("Please enter a number: "))

KeyboardInterrupt

A try statement may have more than 1 except clause

At most 1 handler is executed.

sum = 0

while True:

try:

sum += int(raw_input("Enter an integer (Ctrl-C to exit): "))

except ValueError:

print "Invalid number, try again ..."

except KeyboardInterrupt:

break

print "\nThe sum is ", sum

C:\user\c690f08>try2.py

Enter an integer (Ctrl-C to exit): 4

Enter an integer (Ctrl-C to exit): a

Invalid number, try again ...

Enter an integer (Ctrl-C to exit): 6

Enter an integer (Ctrl-C to exit): Ctrl-C

The sum is 10

Class Exception is the root of the hierarchy of exception classes

An except clause may name multiple exceptions as a tuple

except (RuntimeError, TypeError, NameError):

pass

Last except clause may omit the exception name

Serves as a wildcard Use this cautiously

Can be used to print an error message then re-raise the exception Allows a caller to handle the exception

import sys

i = -1

try:

f = open('data.txt')

s = f.readline()

i = int(s.strip())

except IOError, (errno, strerror):

print "I/O error(%s): %s" % (errno, strerror)

except ValueError:

print "Could not convert data to an integer."

except:

print "Unexpected error:", sys.exc_info()[0]

raise

Here, sys.exc_info() returns a tuple (type, value, traceback) where type is the type of exception (a class object) value is the exception parameter (a class instance if the exception type is a class

object) traceback is an object encapsulating the call stack at the point of exception

The try statement has an optional else clause

After all except clauses

Useful for code executed if the try clause doesn’t raise an exception

import sys

for arg in sys.argv[1:]:

try:

f = open(arg, 'r')

except IOError:

print 'cannot open', arg

else:

print arg, 'has', len(f.readlines()), 'lines'

f.close()

Better than adding code to the try clause

Avoids accidentally catching an exception not raised by the code protected by the try statement

Exception Arguments and Raising Exceptions An exception may have an associated value (its argument)

Presence and type of the argument depend on the exception type

Pass a single argument to an exception

Binds it to the message value attribute

A tuple for multiple arguments

Separated from the message name by a comma

>>> try:

... '1'+2

... except TypeError, detail:

... print detail

...

cannot concatenate 'str' and 'int' objects

The raise statement lets you force a specified exception

Possibly with arguments

The operand must be a valid exception class

What we pass to it is up to us

>>> raise NameError



NameError

>>> raise NameError('Hi')



NameError: Hi

>>> raise NameError('Hi', 'Bob')



NameError: ('Hi', 'Bob')

Recall that, in an except clause, raise re-raises the exception

Exception handlers don't just handle exceptions occurring immediately in the try clause

Also handle exceptions inside functions called (even indirectly) in the try clause

>>> def this_fails():

... x = 1/0

...

>>> try:

... this_fails()

... except ZeroDivisionError, detail:

... print 'Handling run-time error: ', detail

...Handling run-time error: integer division or modulo by zero

Some Common Exception ClassesException

All user-defined exceptions should be derived from this class

ArithmeticError

Base class for built-in exceptions raised for arithmetic errors

LookupError

Base class for exceptions raised when a key or index used on a dictionary or sequence is invalid

AttributeError

Raised when an attribute reference or assignment fails

KeyError

Raised when a dictionary key isn’t found

NameError

Raised when a local or global name isn’t found Associated value is a message that includes the name not found

TypeError

Raised when an operation/function is applied to an object of inappropriate type

Associated value is a string giving details about the type mismatch

ZeroDivisionError

Raise these predefined exceptions to create meaningful error messages

E.g., suppose in function foo() we're trying to find the value of variable name in dictionary people

if name not in people:

raise LookupError(name + ' not present')

A call to foo() may be nested in a try statement

try:

foo()

except LookupError, detail:

print "In foo(), handling the grads, " + str(detail)

detail is actually an instance of LookupError and must be converted to a string

Defining Clean-up Actions The try statement has an optional finally clause for clean-up

actions executed under all circumstances

A finally clause is executed before leaving the try statement, whether or not an exception occurs

When

an exception occurs in the try clause and isn’t handled by an except clause or

it occurs in a except or else clause,

it’s re-raised after the finally clause is executed

The following is file try6.py in folder C:\Python27

def divide(x, y):

try:

result = x / y

except ZeroDivisionError:

print "division by zero!"

else:

print "result is", result

finally:

print "executing finally clause“

>>> from try6 import *

>>> divide(2,1)

result is 2

executing finally clause

>>> divide(2,0)

division by zero!

executing finally clauseContinued

>>> divide("2","1")

executing finally clause



File "try6.py", line 3, in divide

result = x / y

TypeError: unsupported operand type(s) for /: 'str' and 'str'

The TypeError raised by dividing 2 strings isn’t handled by the except clause So it’s re-raised after the finally clauses is executed

A finally clause is useful for releasing external resources—e.g., files or network connections— whether or not use of the resource succeeded

Predefined Clean-up Actions Some objects define standard clean-up actions for when the object

is no longer needed whether use of the object succeeded or failed

Consider

for line in open("myfile.txt"):

print line

This leaves the file open for an indeterminate amount of time after the code finishes executing

The with statement ensures that objects are cleaned up

with open("myfile.txt") as f:

for line in f:

print line

After this is executed, file f is always closed even if there’s a problem processing the lines

ClassesPython Name Spaces and Scopes A namespace is a mapping from names to objects

Most namespaces currently implemented as Python dictionaries—may change

Examples of namespaces

the set of built-in names

the global names in a module

the local names in a function invocation

No relation between names in different namespaces

An attribute is any name following a dot

In a sense, the set of attributes of an object form a namespace

References to names in modules are attribute references

In modname.funcname, modname is a module object funcname is an attribute of it

Name spaces are created at different moments, have different lifetimes

The local namespace for a function is created when it’s called

Deleted when the function returns or raises an exception not handled within the function

Recursive invocations each have their own local namespace

A scope is a textual region of a program where a namespace is directly accessible (unqualified names)

At any time during execution, the namespaces of at least 3 nested scopes are directly accessible

the innermost scope contains the local names Searched first for a name used in the code

the namespaces of any enclosing functions Searched starting with the nearest enclosing scope

the middle scope contains the current module's global names Searched next

the outermost scope is the namespace containing built-in names Searched last

If a name is declared global,

then all references and assignments go directly to the middle scope containing the module's global names

Otherwise, all variables outside innermost scope are read-only

Class Definition Syntax Simplest form of class definition:

class ClassName:

statement-1 ...

statement-N

Class definitions must be executed before having any effect

In practice, statements inside a class definition are function definitions

But other statements are allowed

When a class definition is entered, a new namespace is created and used as the local scope

When a class definition is left normally (via the end), a class object is created a wrapper around the contents of the namespace created by the

class definition

Class Objects Class objects support 2 kinds of operations: attribute references and

instantiation

Attribute references use the standard Python syntax for attribute references: obj.name

Consider the following class definition

class OurClass:

"A simple class example"

n = 36

def f(self):

return "Hello"

OurClass.n and OurClass.f are valid attribute references

Return an integer and a function object, resp.

4 of the 5 predefined attributes of classes (5th relates to inheritance)

__doc__ is the docstring belonging to the class

__dict__ (a dictionary) contains the class name space

__name__ is the name of the class

__module__ is the name of the module where this class is defined

User-defined class attributes can be assigned to

But __name__ (and __bases__, see below) are read-only

A function in the standard os module

chdir(path ): Change current working directory to path

>>> from os import *

>>> chdir("C:\user\c690f08")

>>> listdir('.')

['class1.py', 'class1.pyc', 'ClassNotes.doc', 'data.txt', ...]

>>> from class1 import OurClass

>>> OurClass.n

36

>>> OurClass.f

<unbound method OurClass.f>

>>> OurClass.__doc__

'A simple class example'

>>> OurClass.__dict__

{'__module__': 'class1', 'f': <function f at 0x00CFABB0>, '__doc__': 'A simple class example', 'n': 36}

>>> OurClass.__name__

'OurClass'

>>> OurClass.__module__

'class1'

>>> OurClass.n = 48

>>> OurClass.n

48

Class instantiation uses function notation

obj = OurClass()

creates a new instance of the class and

assigns this object to local variable obj

The instantiation operation creates an empty object

Often we want objects with customized initial states

So a class may define a special method __init__()

By convention, 1st argument of a method is called self

def __init__(self):

self.data = []

Then class instantiation automatically invokes __init__()

If __init__() has arguments after self, then arguments given to the class instantiation operator are

passed on to __init__()

Another special function name is __str__

In its definition, include the formal parameter self

Defines the string returned when an instance of the class is passed to str(), hence also how the instance is displayed by print

Instances have 2 predefined attributes

__dict__ is a dictionary containing the instance name space

__class__ is the class of this instance

class Person:

"A simple record of a person"

def __init__(self, first, last):

self.first_name = first

self.last_name = last

def __str__(self):

return self.first_name + " " + self.last_name

>>> from class1 import Person

>>> fred = Person("Fred", "Smith")

>>> fred

<class1.Person instance at 0x00D045D0>

>>> str(fred)

'Fred Smith'

>>> print fred

Fred Smith

>>> fred.__dict__

{'first_name': 'Fred', 'last_name': 'Smith'}

>>> fred.__class__

<class class1.Person at 0x00D063F0>

>>> fred.__class__.__name__

'Person'

Instance Objects Only operations understood by instance objects are attribute references

2 kinds of attribute names: data attributes and methods

Data attributes correspond to C++ ”data members” needn't be declared.

>>> fred.counter = 1

>>> while fred.counter < 5:

... fred.counter += 1

...

>>> print fred.counter

5

>>> del fred.counter

>>> fred.counter



AttributeError: Person instance has no attribute 'counter'

A class attribute that’s a function object defines a corresponding method of its instances

>>> obj = OurClass()

obj.f is a method reference since MyClass.f is a function

But obj.f isn’t the same thing as MyClass.f It’s a method object, not a function object

>>> OurClass.f

<unbound method OurClass.f>

>>> obj.f

<bound method OurClass.f of <class1.OurClass instance at 0x00AA9C60>>

Consider the following>>> class Foo:

... def bar(self):

... print "Hi"

...

>>> f = Foo()

>>> f.bar()

Hi

>>> Foo.bar(f)

Hi

Referencing Attributes inside and outside Method Definitions Data attributes may be referenced by methods as well as by clients

Python can’t enforce data hiding—it’s based on convention

Methods access data attributes and call other methods using attributes of the self argument

No shorthand for referencing data attributes or other methods from within a method Can’t confuse local variables and instance variables

class MyBag:

def __str__(self):

return str(self.data)

def __init__(self):

self.data = []

def add(self, x):

self.data.append(x)

def addtwice(self, x):

self.add(x)

self.add(x)

>>> from BagClass import MyBag

>>> bg = MyBag()

>>> bg.add(1)

>>> bg.addtwice(2)

>>> print bg

[1, 2, 2]

Inheritance Syntax for a derived class definition

class DerivedClassName(BaseClassName ):

statement-1 ...

statement-N

Name BaseClassName must be defined in a scope containing the derived class definition

In place of a base class name, other expressions allowed

E.g., for a base class defined in another module:

class DerivedClassName(modname.BaseClassName ):

DerivedClassName() creates a new instance

__bases__ is a predefined attribute for classes

Contains a tuple of the classes from which this class inherits

If a requested attribute isn’t found in the class, look in the base class

Do so recursively if the base class itself is derived

A method reference is valid if this yields a function object

A derived class may thus override methods of its base classes

A method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it

A “new-style” class is one that’s derived from the predefined, top-level class object or one that inherits from such a class

New-style classes have advantages (e.g., they integrate seamlessly into Python’s type system)

We’ll henceforth use new-style classes

__init()__ is inherited

Don’t try to use super()

An overriding method may want to extend, not replace, the base class method

To call the base-class method directly,

BaseClassName.methodname(self, arguments)

This calls a function attribute of class BaseClassName

class Tax(object):

def __init__(self, inc, deps):

self.income = inc

self.dependents = deps

def state_tax(self):

return 0.07 * self.income if self.dependents > 3 else 0.1 * self.income

def county_tax(self):


class PoorTax(Tax):


return 0.02 * self.income

class RichTax(Tax):

def __init__(self, inc, deps, age):

self.income = inc


self.age = age


return 1.3 * Tax.state_tax(self) * (1.3 if self.age < 50 else 1.0)

An equivalent way to define __init__() for RichTax


Tax.__init__(self, inc, deps)

self.age = age

We can break this up into separate files (in the same folder)

File tax1.py

class Tax(object):

def __init__(self, inc, deps):

self.income = inc






File ptax.py

from tax1 import Tax

class PoorTax(Tax):


return 0.02 * self.income

File rtax.py

import tax1

class RichTax(tax1.Tax):


self.income = inc


self.age = age


return 1.3 * tax1.Tax.state_tax(self) * (1.3 if self.age < 50 else 1.0)

Sorting Sequences of Objects Recall that a common pattern is to sort complex objects using only their

values at a given index

For objects, we use named attributes

>>> class Student:

... def __init__(self, name, grade, age):

... self.name = name

... self.grade = grade

... self.age = age

... def __repr__(self):

... return repr((self.name, self.grade, self.age))

...

>>> student_objects = [Student('john', 'A', 15),\

Student('jane', 'B', 12), Student('dave', 'B', 10)]

>>> sorted(student_objects, key=lambda student: student.age)


Recall that the key-function patterns for sorting are very common

Module operator provides convenience functions to make accessor functions easier and faster

Recall that operator.itemgetter(index) returns a callable object that fetches the value at index index of its operand

We used this to sort a list of tuples

And operator.itemgetter(key) also works with dictionaries

We used this to sort a list of dictionaries on key ‘age’

Now, operator.attrgetter(attr) returns a callable object that fetches the value of attribute attr from its operand (an object)—e.g.,

>>> import operator as op

>>> st = Student('al', 'A', 39)

>>> op.attrgetter('age')(st)

39

Use this to sort the above list of objects

>>> sorted(student_objects, key=op.attrgetter('age'))


The operator module functions allow multiple levels of sorting

This holds for attrgetter() as well

>>> sorted(student_objects, key=op.attrgetter('grade', 'age'))

[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]

Iterators Most container objects can be looped over using a for statement

Behind the scenes, the for statement calls iter() on the container object

iter() returns an iterator object

Defines the method next()

Accesses container elements 1 by 1

When elements are exhausted Raises a StopIteration exception Terminates the for loop

>>> a = [1, 2]

>>> it = iter(a)

>>> it

<listiterator object at 0x00D02630>

>>> it.next()

1

>>> it.next()

2

>>> it.next()



StopIteration

To add iterator behavior to a classes,

define a __iter__() method returning an object with a next() method

If the class defines next(), then __iter__() can return self

class Reverse:

"Iterator for looping over a sequence backwards"

def __init__(self, data):

self.data = data

self.index = len(data)

def __iter__(self):

return self

def next(self):

if self.index == 0:

raise StopIteration

self.index = self.index - 1

return self.data[self.index]

>>> from reverse import Reverse

>>> for x in Reverse([1, 2]):

... print x,

...

2 1

Generators Generators are a powerful tool for creating iterators

Written like a normal function

But use the yield statement when it returns data

Each time next() called, generator resumes where it left off

def reverse(data):

for index in range(len(data)-1, -1, -1):

yield data[index]

>>> from reverse1 import *

>>> for x in reverse([1, 2]):

... print x,

...

2 1

__iter__() and next() created automatically

Local variables & execution state automatically saved between calls

When a generator terminates, automatically raises StopIteration

Generator Expressions Some simple generators can be coded succinctly as expressions

Syntax like list comprehensions, but parentheses, not brackets

For where generator is used right away by an enclosing function

>>> (i*i for i in range(10))

<generator object at 0x00CE9AA8>

>>> g = (i*i for i in range(10))

>>> sum(g)

285

>>> xvec, yvec = [10, 20, 30], [7, 5, 3]

>>> sum(x*y for x, y in zip(xvec, yvec))

260

>>> line = 'the cat'

>>> list(line[i] for i in range(len(line)))

['t', 'h', 'e', ' ', 'c', 'a', 't']

Modules A script is a file of commands to be executed by the interpreter

A module is a file with definitions and statements used in

a script or

an interactive instance of the interpreter

The collection of identifiers accessible in a script or in interactive mode is the main module

Definitions in a module can be imported into

the main module and

other modules

The module’s name is the file name without extension .py

Available in a module as value of global variable __name__

Example module named fibo in file fibo.py

# Fibonacci numbers module

def fib(n): # write Fibonacci series up to n

a, b = 0, 1

while b < n:

print b,

a, b = b, a+b

def fib2(n): # return Fibonacci series up to n

result = []

a, b = 0, 1

while b < n:

result.append(b)

a, b = b, a+b

return result

How Python Locates Modules Modules are searched in the list of directories in variable sys.path

>>> import sys

>>> sys.path

['',

'C:\\WINDOWS\\system32\\python27.zip',

'C:\\Python27\\DLLs',

'C:\\Python27\\lib',

'C:\\Python27\\lib\\plat-win',

'C:\\Python27\\lib\\lib-tk',

'C:\\Python27',

'C:\\Python27\\lib\\site-packages']

Put fibo.py in one of these folders

C:\Python27 is the interactive interpreter’s “home”

C:\Python27\lib\site-packages is where 3rd-party distributions are placed

Or append the name of the folder (a string with escaped backslashes) to sys.path

>>> sys.path.append('E:\\Old D Drive\\c690f08')

Or put it in some folder and change to that folder—e.g.,

>>> from os import *

>>> chdir("E:\old d drive\c690f08")

Then

>>> import fibo

Python comes with a library of standard modules

See http://docs.python.org/modindex.html

Some modules are built into the interpreter

The folder containing the script being run is on the search path

So, if the script had the same name as a standard module we’re trying to import, Python would try to load the script as a module instead

Importing Modules Importing the module enters its name in the current symbol table

But doesn’t enter names of the functions and classes defined in the module

To invoke such a function, use dot notation: module.function

>>> fibo.fib(50)

1 1 2 3 5 8 13 21 34

For brevity, assign the function to a local name

>>> fib = fibo.fib

>>> fib(50)

1 1 2 3 5 8 13 21 34

Executing Modules as Scripts When you run a Python module as a script with

C:\Some Folder> fibo.py arguments

the code in the module will be executed, just as if you imported it

But __name__ is set to “__main__”

So make the file usable as a script as well as importable as a module by adding at the end, e.g.,

if __name__ == "__main__":

import sys

fib(int(sys.argv[1]))

The sys.argv Variable The script name and optional following arguments are passed to the

script in variable sys.argv

List of strings

Length is at least 1

When no script and no arguments are given (in the Python command line),

sys.argv[0] is just an empty string

>>> sys.argv

['']

When -m module is used, sys.argv[0] is set to the full name of the located module.

E.g., file C:\Python27\args.py

def out_args():

import sys

print sys.argv

if __name__ == "__main__":

out_args()

C:\>python -m args 1 2 3

['C:\\Python27\\args.py', '1', '2', '3']

Get the same if we just execute args.py as a script

Let Windows figure out we need the Python interpreter from the .py extension

C:\>args.py 1 2 3

['C:\\Python27\\args.py', '1', '2', '3']

In last 2 example runs, the interpreter used the paths in the sys.path variable to locate args.py (in C:\Pyhton27)

It can’t use this info if we invoke it directly as the operand of the python command

Looks only in the current folder

C:\>python args.py 1 2 3python: can't open file 'args.py': [Errno 2] No such file or directory

C:\>cd python27

C:\Python27>python args.py 1 2 3

['args.py', '1', '2', '3']

Executing Modules as Scripts (Resumed) Recall that we added the following to the end of fibo.py

if __name__ == "__main__":

import sys

fib(int(sys.argv[1]))

Executing this module as a script:

C:\Python25>python fibo.py 50

1 1 2 3 5 8 13 21 34

If the module is imported, the code isn’t run

>>> import fibo

>>>

Used for a convenient user interface to a module

Or for testing Running the module as a script executes a test suite

“Compiled” Python files A speed-up for starting short programs using many standard modules

If foo.pyc exists in the folder where module foo.py is found,

assumed to contain a version of module foo already “byte-compiled”

Modification time of the version of foo.py used to create foo.pyc is recorded in foo.pyc

.pyc is regenerated if modification times don't match

When foo.py is compiled, compiled version automatically written to foo.pyc

Contents of .pyc files are platform independent

A Python module directory can be shared across architectures

When the Python interpreter is invoked with the –O flag,

optimized code is generated and stored in .pyo files

Optimizer currently doesn't help much

A program doesn't run any faster when read from a .pyc or .pyo file than when read from a .py file

Only thing that's faster is load time

When a script is run by giving its name on the command line,

the bytecode isn’t written to a .pyc or .pyo file

Can have a file foo.pyc (or foo.pyo) without a file foo.py for the same module

Used to distribute a library of Python code

Moderately hard to reverse engineer

Packages Packages let us structure Python's module namespace using

“dotted module names”

E.g., A.B denotes submodule B in package A

Dotted module names let authors of a multi-module package not worry about each other’s module names

As modules let authors of different modules not worry about each other’s global variable names

Examples We design a package (collection of modules) to handle uniformly

sound files and sound data

There are many sound file formats

Growing collection of modules to convert between file formats

There are also many operations on sound data

Growing collection of modules for these operations

Next slide: possible structure for our package as a hierarchical file system

sound/ Top-level package

__init__.py Initialize the sound package

formats/ Subpackage for file format conversions

__init__.py

wavread.py

wavwrite.py

auread.py

auwrite.py

...

effects/ Subpackage for sound effects

__init__.py

echo.py

surround.py

reverse.py

...

filters/ Subpackage for filters

__init__.py

equalizer.py

vocoder.py

...

Importing import statements are executed in 2 steps

1. find a module, and initialize it if necessary

2. define a name or names in the local namespace (of the scope where the import statement occurs)

“Initialize'” a built-in or extension module: call an initialization function the module provides

“Initialize” a Python-coded module: execute the module's body

Module searching generally involves searching the current folder for a module with the given name

If this fails, search containing package (if there’s one—see below)

If this fails, search a list of locations given in sys.path

__init__.py files are required to make Python treat the folders as containing packages

So a folder with a common name doesn’t hide valid modules later in the module search path

__init__.py could be empty

Can import individual modules from a package

import sound.effects.echo

Then sound.effects.echo.echofilter is referenced with its full name

sound.effects.echo.echofilter(input, output, delay=0.7, atten=4)

An alternative way to import the submodule

from sound.effects import echo

Makes submodule echo available without its package prefix

echo.echofilter(input, output, delay=0.7, atten=4)

Yet another variation: import the desired function or variable directly

from sound.effects.echo import echofilter

Makes function echofilter() of submodule echo directly available

echofilter(input, output, delay=0.7, atten=4)

In the form

from package import item

item can be a submodule (or subpackage) of the package or some other name defined in the package (e.g., a function, class,

variable)

The import statement 1st tests if the item is defined in the package If not, it assumes it’s a module and attempts to load it If it fails to find it, an ImportError exception is raised

In the form

import item.subitem.subsubitem

each item but the last must be a package

The last can be a module or a package Can't be a class, function, or variable defined in the previous item

Note the forms

import mod1 as amod1

from mod import id1 as aid1

“as” causes the object to be bound to the alias in the namespace

E.g.,

import sound.effects.echo as echo_mod

from sound.effects import echo as echo_mod

from sound.effects.echo import echofilter as efilter

All import forms allow multiple items to be imported

Comma-separated list

import sound.effects.echo, sound.filters.equalizer

from sound.effects import echo, surround

from sound.effects.echo import echofilter, echoamp

from sound.effects import echo as ech, surround as sur

The form of import without from binds the module name in the local namespace to the module object

When a submodule of a package is loaded, Python ensures that the package itself is loaded first

The from form doesn’t bind the module name

Importing * from a Package The forms of import with from may have * in place of the (list of)

identifier(s) at the end

Used only in a module scope, not in function or class definitions

When Python executes

from sound.effects import *

we’d expect it to consult the filesystem to find which submodules are present

in the package and to import them all

But doesn’t work well on Windows (case) and some other platforms

The package author provides an explicit index of the package

Importing * from a module or package is frowned upon

Can cause hard-to-read code

But OK to save typing in interactive sessions

And certain modules are designed to export only names that follow certain patterns

The recommended notation is

from package import specific_submodule

unless the importing module needs to use submodules with the same name from different packages

Intra-package References A package’s submodules often must refer to each other

E.g., the surround module might use the echo module

import 1st looks in the containing package before looking in the standard module search path

So, the surround module can use just

import echo

or

from echo import echofilter

web science python notes albert esterline spring 2015

Documents

string print

string literals

n help

string cat1asubstrings

string operations

command line

multiple lines

wayscontinuation lines