Self defence against large snakes

1 Introduction

This is a document that attempts to teach the Python language. It is not a replacement for the offical Python tutorial at http://docs.python.org/tutorial/index.html but adopts a more example driven approach. This tutorial is peppered with exercises and practical seesions. I recommend that you try out the exercises by yourself even if they seem hard at first. After all, the only way to learn to program is to program.

The tutorial assumes some basic familiarity with programming in general and makes some slight references to C and C++ to illustrate some points.

It assumes that you have the python programming language installed on your machine. You can obtain it from the official python site (please select the appropriate format for your platform).

Emphasis is done like so. Language literals are typeset in a monospace format. Screen transcripts and ascii graphics are typeset inside a separate indented box using monospace font. Comments inside these boxen are italicised and typeset using a slightly lighter colour. Exercises are typeset in dark grey boxen so that they stand out.

After a few initial sections, we will use code snippets instead of screen transcripts so you won't see the interpreter prompts.

Python is a very high level multi-paradigm programming language which emphasises programmer productivity and code readability.

1.1 Why Python?

Python's emphasis on productivity removes artificial barriers for programmers. ie. It's designed for the 'smart' programmer rather than for the kind who needs protection from himself. In this sense, it's similar to C rather than to Java.
Python's emphasis on readability makes 'good' code almost an objective thing. It doesn't encourage the Perl way of doing things where the emphasis is on expressiveness rather than on clarity. Good Python code is easy to identify. It also improves the ability of secondary programmers to understand code which they didn't write.
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. - B.W. Kernighan
Python's large and powerful standard library is part of it's batteries included philosophy. This means that if you're writing an application, the libraries you need are almost surely already there as part of the default language installation.
Python is extensible using C and embeddable in other applications as a scripting framework making them flexible. This allows Python to be used as an extremely effective glue language which can handle all the high level work while leaving the performance intensive parts to C or something lower.
The python community is extremely responsive and careful about the needs of it's users. This is especially obvious in how much care is taken to keep the language backward compatible between releases.
Rich 3rd party packages of for kinds of things like web frameworks, low level graphics libraries, GUI tool kits, publishing libraries, image manipulation libraries etc.
Best of all worlds. Python has over the course of it's evolution borrowed the good parts of many languages and added it's own touch to all of them (eg. docstrings, generators, namespaces etc.)

1.2 Starting off

Fire off the interpreter from your command line. The command used is python.

sanctuary% python2.6
Python 2.6+ (r26:66714, Oct 22 2008, 09:25:02) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

The python interactive interpreter can be used as a quick calculator. Parentheses can be used to alter default operator precendences.

2+3
5
 
0.5*5
2.5

2+3*5
17
 
(2+3)*5
25

You can also use variables like you do in most other languages. There's not need to declare them before you use them. Variables don't have any sigils (like in perl) or static type declarations (like in C).

x=5
x+3
8

x/2.0
2.5

Trying to access the value of a variable before it is assigned one will cause the interpreter to print an error.

t=t+1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 't' is not defined

So far, we've only seen numbers, Python can also deal with strings in a fashion quite similar to numbers. The results are quite intuitive.

c1 = "Coimbatore"

c1 + " Bangalore"
'Coimbatore Bangalore'

c1*3
'CoimbatoreCoimbatoreCoimbatore'

c2 = "Bangalore"

c1 + c2
'CoimbatoreBangalore'

You can reassign a variable without worrying about it's type.

x=2

x+2
4

x="Bangalore"

x+", Karnataka"
'Bangalore, Karnataka'

1.3 Conditionals

Python's if statement is similar to those in most other languages. It allows us to make a branching decision.

x = 10
if x<10:
    print "x is less than 10"
elif x>10:
    print "x is greater than 10"
else:
    print "x is 10"
x is 10

Unlike many other contemporary languages, python is not format free and relies on indentation to group statements together. So, a block of statements (eg. the body of a function) which you would demarcate using { and } in C would be grouped together in Python by indenting them all by the same amount. In them example above, the indentation of the print statements makes them part of the body of the if and the else part. The elif is a shorthand for else if which would be unnecessarily verbose.

It should also be understood that parts of the code which are not visited (because of a conditional) might contain runtime errors which are uncaught.

Exercise: Consider the following piece of code. Will it error out? How?

x = 2
if x==2:
  print x
else:
  print y

Exercise: Consider the following piece of code. Will it error out? How?

x = 2
if x==2:
  print x
else:
  x +

1.4 Functions

Just like a value can be associated with a name (eg. x = 2), it's also possible for a piece of logic to get tied to a name. Such an association is called a function.

Functions are similar to the mathematical entities by the same name. They take some inputs and return one or more outputs. They are defined using the def keyword.

The following code snippet creates a function that will return the square of it's argument.

def square(x):
    rte = x*x
    return rte


square(5)
25

x=square(7)

print x
n49

As you can see, the body of the function is indented so that it 'belongs' to the function. Python doesn't have braces.

Functions can take the place of literal expressions.

print 1+square(7)
50

In Python, functions are first class objects and can be assigned and used like other objects. They can also be passed as arguments to other functions.

other_name = square

other_name(7)
49

Python function definitions are executable statements that may appear wherever it is legal for staments to appear. The functions will not get defined unless these statements are executed. As you can see in the example below, foo is not defined because the body of the if didn't execute.

x = 2
if x == 3:
  def foo(): print "Hello"

>>> foo()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'foo' is not defined

One line anonymous functions can be created using the lambda keyword.

sq = lambda x:x*x

sq(7)
49

Function arguments can either be positional or keyword. The former is similar to the C syntax and what we have been using so far (ie. First argument goes to first formal parameter etc.). The other way of doing it is to use keyword arguments. The following example illustrates this.

def greet(greeting, person):
    return "%s, %s"%(greeting,person)


greet("Hello", "noufal") # positional
'Hello, noufal'

greet(greeting = "Hello", person = "Noufal") # Positional (Same order as definition)
'Hello, Noufal'

greet(person = "Noufal", greeting = "Hello") # Positional (Different order)
'Hello, Noufal'

Functions can written to accept default arguments. Consider the following example of a function called power which will raise it's first argument to the second and to 2 if the second argument is unspecified.

def power(n,pow=2):
    return n**pow <i># The ** operator is the exponentation operator</i>


power(7) # <i> Returns 7<sup>2</sup></i>
49

power(7,3) # <i> Returns 7<sup>3</sup></i>
343

Variables created inside a function have local scope. This means that you don't have to worry about the variables that exist outside a function when you're coding it.

def average(a,b): # Create a function that uses a local variable s
    s = a+b
    avg = s/2.0
    return avg


s = 10 # Create a variable call s

average(10,20) # Call our function. s inside the function will be 30. 
15.0

s # But we still have it as 10
10

Variables are looked up first in the function local symbol table (locals) and if they're not found there, in the global symbol table (globals).

If you do want to refer to variable that has been initialised outside the function, you need to use the global keyword to tell the interpreter so.

def average(a,b):
    global s <i># Now we use the global s. Not a function local one</i>
    s=a+b
    avg = s/2.0
    return avg


s=10

average(10,20)
15.0

s
30 <i># The value has changed</i>

It is possible (and recommended) to put a documentation string (or docstring for short) in the header of a function to describe what it does.

def square(x):
    "Returns the x raised to the power of 2"
    return x*x

square(7)
49

help(square)
Help on function square in module __main__:

square(x)
    Returns the x raised to the power of 2

Exercise: Consider the following code. What will be the output at the points marked with a question mark?

def f(x): 
  return x + x
def g(): 
  return f(5)
g()
?
def f(x): 
  return x * x
g()
?

Exercise : What will happen if the following code is executed? Why?

def foo():
  def bar():
    print "Hello"

foo()
bar()

1.5 Modules/packages

In order to facilitate reuse of code, it is possible to organise our functions into modules. These are nothing more than Python files. A large number of modules can be organised into a package. Here is an example


     +--------------------------------------------------------------------+
     | +--------------------+    +--------------------+                   |
     | |                    |    |                    |                   |
     | |module:colour       |    |module:brush        |                   |
     | +--------------------+    +--------------------+                   |
     | +--------------------+                                             |
     | |                    |                                             |
     | |module:effects      |                                             |
     | +--------------------+                            package:graphics |
     +--------------------------------------------------------------------+

The colour module would be referred to as graphics.colour.

Modules are made available to the current program using the import keyword.
Structurally, modules are simply Python source files that have the .py extension.
When we try to import something, Python will first look in the current directory and the in the standard system directories.
For example, we could put the the square function we wrote into a module, import it and use it.

Imports can be of 3 types

`import x`	Loads x.py and allows access to it's attributes via x
`from x import y`	Loads x and puts x.y into the current namespace
`from x import *`	Loads x and puts all it's attributes into the current namespace

noufal@sanctuary% cat numeric.py
"""
This module contains numeric routines.

square : returns the square of the numeric argument
"""

def square(x):
    "Returns the square of the given number x"
    return x*x



import numeric
numeric.square(7)
49
square(7)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'square' is not defined

When a module is imported for the first time, it will be compiled into a .pyc file. The next time an import is done without the original being changed, the .pyc file is directly loaded rather than the .py. This is an optimisation performed by the interpreter.
When we have multiple modules that are all in some sense 'part' of a larger whole, we can put all these modules into a directory and create an __init__.py file in the directory. This is a package.

The contents of the package directory are shown below as a directory tree.

.
+-- numeric
    |
    +-- __init__.py
    |
    +-- exp.py

from numeric import exp

exp.square(5)
25

1.6 Primitive types

Python provides many primitive data types.

From this section on, we will use direct code snippets instead of screen grabs of the interpreter sessions.

1.6.1 Strings

Python strings are immutable data types. The literal notation uses the ' or the " operators. For multiline strings, you can use the """ operator.
```
foo = "Hello"

foo = """This is a paragraph
of text
spread across
4 lines"""
```

Strings support the + operator for concatenation, the * operator for repetition, the % operator for substitution and the [] operator for slicing or indexing.

city = "coimbatore"
state = "tamil nadu"
print city + "," +state # String concatenation
'coimbatore,tamil nadu'

"-" * 50 # Quick way to create a divider line
'--------------------------------------------------'

print "%s, %s"%(city,state)
'coimbatore, tamil nadu'

print city[0]
'c'
print city[1]
'o'
print city[0:5]
'coimb'
print city[-2]
'r'

String indices can be negative. The following diagram illustrates the numbering. It helps to think of the indices as pointing between cells rather than at cells.

            0      1      2      3      4      5      6
            +------+------+------+------+------+------+
            |      |      |      |      |      |      |
            | P    |  Y   |  T   |  H   |  O   |  N   |
            +------+------+------+------+------+------+
            -6     -5     -4     -3     -2     -1

Strings also accept the u or r modifiers to indicate that they are unicode or raw respectively.

print "Tamil\nNadu" # \n is interpreted as a newline
Tamil
Nadu

print r"Tamil\nNadu" # \n is parsed as two separate characters.
Tamil\nNadu

print type("snake") # type is a builtin that returns the type of an object
<type 'str'>

print type(u"snake")
<type 'unicode'>

Strings also have several methods which can be used to manipulate them. This is a non exhaustive list.

Method	Use	Result
capitalize	"python".capitalize()	"Python"
count	"quux".count('u')	2
endswith	"quux".endswith("x")	True
startswith	"quux".startswith("s")	False
find	"trek".find('e')	2
split	"graphics.canvas".split(".")	["graphics","canvas"]
strip	" Spaced out ".strip()	"Spaced out"

1.6.2 Lists

Lists are non homogeneous (they can hold multiple types of objects) mutable (they can be changed in place) ordered (elements inserted into a list can be retrieved in the same order they where inserted) collections of items.

Lists are constructed using the [ and ] operators.

foo = ["Python", "Lisp", 0, 1.5]  # Construction
print foo[1] # Print the first element
Lisp

print foo[0:2] # Slicing (similiar to strings)
['Python', 'Lisp']

print foo[0][0] # Print first character of the first element
"P"

Some list operations are summaried in the table below

Operation	Example	Result
+	[1,2] + ["Hello"]	[1,2,"Hello"]
*	[1,2]*3	[1,2,1,2,1,2]
All examples below use the following list
x=[1,2,3]
append (in place)	x.append(4)	[1,2,3,4]
extend (in place)	x.extend([4,5])	[1,2,3,4,5]
reverse (in place)	x.reverse()	[3,2,1]

The following example uses x=[3,2,1]
sort (in place)	x.sort()	[1,2,3]

Exercise: Can you say what will be the output of this code? Why?

foo = [3,2,1]
bar = foo.sort()
print bar

Exercise: What will the variable x contain after this piece of code is executed?

x = [1,2,3]
x[0 ] = x

Exercise: Write a simple program that can evaluate postfix arithmetic expressions of the form "2 3 + 4*" (the value is (2+3) * 4 = 20).

1.6.3 Tuples

Tuples are similar to lists in all senses except that they are immutable.
They are created using the ( and ) operators.
```
x=(1,2,3)
print x
(1,2,3)
```
Tuples and lists can be unpacked during assignment. That means that it's legal to say the following
```
a,b = (3,4)

print a
3

print b
4
```

1.6.4 Dictionaries

Dictionaries are non homogeneous, unordered, mutable objects that provide random access to elements. They are Python's hash tables.
They are declared using the { and } operators.

Dictionaries are associative data structures that map a key to a value. The value can be accessed through the key using the [] operator.

foo = {'city' : 'Coimbatore',
'state' : 'Tamil Nadu',
'country' : 'India'}

foo['country']
'India'

foo.keys()
['city', 'state', 'country']

foo.items()
[('city', 'Coimbatore'), ('state', 'Tamil Nadu'), ('country', 'India')]

foo.values()
['Coimbatore', 'Tamil Nadu', 'India']

Exercise: How would you create a nested dictionary? Can you think of a good use for such a data structure?

1.7 Looping

Loops in python come in two varieties. One using for commonly used for definite iteration and the other using while used for indefinite iteration.
for loops iterate over an iterable using a loop variable. Some examples are shown below
```
for i in [1,2,3,4]: # i is the loop variable
    print i

1
2
3
4
```

while loops iterate as long as a condition is true

x = 0

while x<5:
    x = x + 1
    print x

1
2
3
4
5

while True:  # This is an infinite loop!
   x = x+1

The break keyword stops the loop and comes out of it. It's useful to terminate a loop prematurely.
The continue keyword stops the current iteration and goes to the next one.
The pass keyword is a do nothing placeholder and is commonly used in loop and function stubs. It's similar to the ; statement in C.
What will the following program print out? Why?
```
for i in range(1,10):
  pass

print i
```

1.8 A digression on functions

We skipped a useful bit of the language regarding functions in our earlier dicussion because we did't have knowledge of the primitives needed. We will cover it here.

Functions usually recieve a fixed number of position (and keyword) arguments. However, we can write functions that recieve an arbitrary number of positional arguments using the * operator.

def sum(*items):
    acc = 0
    for i in items:
        acc += i
    return acc


sum(1,2,3,4,5,6)
21

sum(1,2,3)
6

The items variable will be a Python list that contains all the arguments.

Similarly, we can make a function that receives an arbitrary number of keyword argumets like so

def init(**params):
    print params.keys()

init(foo = 1, bar = 2)
['foo','bar']

Exercise : Implement a function every which will return True if all it's arguments are True and False if not.

1.9 I/O

Basic output to the screen is handled using the print keyword. It will output the string representation of it's argument plus a newline to standard output.

Input can be read from standard input using the raw_input function.

x = raw_input("Enter your name :") # Will prompt the user for some input and block
Enter your name :Noufal

print "Hello %s"%x
"Hello Noufal"

Files are opened with the file (or open ) constructor. It receives the name of the file followed by a mode specification string indicating whether the file is to be opened for reading, writing or appending and whether it is binary or textual.

The constructor returns a file object which has write, writelines, read and readlines methods to put and get data.

f = open("/tmp/foo.txt","w")
f.write("This is a sample")
f.close()

print open("/tmp/foo.txt","r").read()
"This is a sample"

1.10 Error handling

In case of an error, python with raise an 'exception'. These can be caught and processed accordingly. Exceptions are of different kinds and will be discussed in detail later.
```
foo = {'name' : 'Noufal'}
try:
   print foo['age']
except KeyError:
   print 'No such key "age"'

'No such key "age"'
```

2 Advanced programming techniques

2.1 Programming pythonically

2.1.1 Duck typing

Python's typing system is dynamic but strong and it's approach to typing is called duck typing (from 'if it looks like a duck and talks like a duck, chances are that it's a duck').

This means that given an object, it's semantics are determined by the interfaces it provides rather than any extra type information held by the object.

For example, if we have a function that doubles it's argument like so

def double(x):
   return x*2

it will work perfectly for numbers. If we say something like double(5), we will get back 10. It will also work fine for strings. If we say double("bam"), we will get bambam. In a statically typed language like C, we would have to declare the type of x and thereby force the double function to accept only objects of the declared type. The upshot of this is that we would have to declare two function double_int and double_string.

In python however, we don't care. If the object in question (the thing that x refers to) supports the * 'protocol', we just use it and don't really worry about the type. There's no need of a common abstract parent class which defines interfaces or anything of the sort.

As Alex Martelli (one of the senior Python programmers) said in a mailing list posting.

In other words, don't check whether it IS-a duck: check whether it QUACKS-like-a duck, WALKS-like-a duck, etc, etc, depending on exactly what subset of duck-like behaviour you need to play your language-games with.

If x can be multiplied by a number, that's enough.

This simplifies a lot of details but moves responsibility to the programmer. For example, there is an 'iteration' protocol (which we'll discuss later) that the for construct uses. This allows it to iterate over anything that supports this protocol. A simple example is shown below

for i in "python":   # Iteration over a string. Prints one character per line
  print i

for i in [1,2,3,4]:  # Iteration over a list. Prints one element per line
  print i

for i in (1,2,3,4):  # Iteration over a tuple. Prints one element per line
  print i

for i in {'language':'Python', 'creator':'Guido'}: # Iteration over a dictionary. Prints one key per line
  print i

for i in 2: # Integers don't support the iteration protocol.
 print i

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

As long as the object we're dealing with object that support the iteration protocol, the looping construct will work fine. As you can see, it works fine for many of the primitive types but doesn't for integers.

The upshot of this is that in Python, we don't really check for types ('if this is an integer, do this, else do that'). We just go ahead and use that aspect of the object in question which we're interested in. If it doesn't support that, an exception gets raised and we deal with it appropriately. This provides an extremely high degree of flexibility. We can write 'generic' functions that transform their input without worrying about what the innards of the objects that we process are.

The downside is that since the type system is lazy and dynamic, the only way of catching errors upfront is to write detailed unit tests along with your program. The "if it compiles, it's good to go" philosophy will not work. We will discuss unit testing later.

2.1.2 Common idioms

Every language has it's own stylistic trends. This section lists a bunch of them that are quite common. The list is not exhaustive but should give you an idea about how things are. A few small new language constructs are mentioned and we'll discuss them as we cover them.

Iterating over a list with indices

foo = ["Perl", "C", "Ruby", "Python", "Java"]
for idx,lang in enumerate(foo): # Iterating over a list of tuples.
  if lang == "Python":
    print "Found at position %s"%idx

Found at position 3

Iterating over multiple lists

names = ["Python", "Perl", "Java"]
leads = ["Guido", "Larry", "Gosling"]
for lang, lead in zip(names, leads): # The zip function compresses n iterables into n-tuples.
 print "%10s | %10s"%(lang,lead) # The string format specifiers have provisions for field widths.

Python |      Guido
  Perl |      Larry
  Java |    Gosling

Designing for importing
When a module is imported, the name of the module is available in a special variable called __name__. If the code is in the main entry point module, the __name__ variable will contain the string __main__.

This allows us to write modules which can behave differently when run and when imported. The idiom is illustrated below
```
# This is the quux.py module

def main():
   # do_something

if __name__ == "__main__":
   main()
```
If this module is run from the command line, the main function will get called (since the value of __name__ is __main__ ) If on the other hand, it's imported, __name__ will be quux and so the execution of main will not occur.

It's common to include some tests with 'import only' modules that are run if you just run the module from the command line. A convenient way to test your modules.

Doctests

The docstring for functions allows us to briefly describe the usage and purpose of a function. It's also possible to embed an example usage cut and pasted from the interpreter prompt into the docstring. Once this is done, Python had a standard module called doctest that allows us to 'test' these functions. A useful way of making sure that the examples in your docstrings are upto date.

# This is the foo.py module

def double(x):
    """ Returns the double of it's argument
    >>> double(5)
    10
    >>> double("Hello")
    'HelloHello'
    >>> double(3.5)
    7.0
    >>>
    """
    return x*2

if __name__ == "__main__":
    import doctest
    doctest.testmod()

noufal@sanctuary% python foo.py
noufal@sanctuary%

# Edit the module to change 7.0 in the example to 7.1
noufal@sanctuary% python foo.py
**********************************************************************
File "foo.py", line 7, in __main__.double
Failed example:
    double(3.5)
Expected:
    7.1
Got:
    7.0
**********************************************************************
1 items had failures:
   1 of   3 in __main__.double
***Test Failed*** 1 failures.

While it's not possible to write doctests for all your functions, it's a useful habit to write them whenever you can. It's easy too since you just import your module, try your function and cut/paste the interpreter session into the docstring. This way you'll surely have some tests for your functions.

Null testing
Empty strings, lists, tuples, dictionaries etc. evaluate to False. Hence, it's a common practice to use
```
if not foo:
   process(foo)
```
rather than say
```
if len(foo) != 0:
   process(foo)
```
Iterating using list indices
While the pythonic way to iterate over a list is to use a for directly over it, sometimes, we need the index of elements as well as the actual element. This can be done as follows
```
for i in range(len(foo)): # range is a function that returns a list from 0 upto it's argument
    print i, foo[i]
```
it's more pythonic to say
```
for idx, i in enumerate(foo):
    print idx, i
```
Statement grouping
It's syntactically valid but in bad form to say
```
if foo : print "Hello"      
```
rather than
```
if foo:
   print "Hello"
```
List unpacking
It's possible to assign multiple varibles from a list in a single neat shot.
```
foo = [1,2,3]

a,b,c = foo

a
1

b
2

c
3
```
An interesting side effect is to say
```
     a,b = b,a
```
to swap two variables.
Compositing strings
Suppose you're reading out strings from a file to create a single large string, the naive way of doing it would be like so.
```
strng = ''
for i in fp:  # fp is a file
  strng += i
print i
```
Since strings are immutable, a new one is constructed for each addition operation. Instead, the pythonic way is to say
```
"".join(list(fp)) # Join the elements of the list together into a single string separated by ""
```

More of these snippets are available at David Goodger's Code like a Pythonista.

2.2 Functional programming

2.2.1 Functional programming basics

Functional programming is a programming paradigm which treats computation as the evaluation of mathematical functions. It emphasises lack of state of immutable objects.

The paradigm is well supported in Python. While not a panacea, functional programs are often more compact, understandable, faster and elegant than their procedural or object oriented counterparts.

While we won't dwell into functional programming in depth, we will discuss some of the primitives Python provides that makes this style of programming possible.

2.2.2 Functional programming methods (map, reduce, filter, lambda)

We have already encoutered lambda. The keyword that allows us to to create one liner functions.

The map function allows us to apply a function to every element of an interable. Let's use our double function to double all elements of a list.

nat = [1,2,3,4,5]
map(double, nat)
[2, 4, 6, 8, 10]

As you can see, this is superior to the procedural method shown below in terms of concision

doubles = []
for i in nat:
   doubles.append(double(i))

print doubles
[2, 4, 6, 8, 10]

The filter function has a similar call signature as map but returns all elements of it's input list which satisfy a certain predicate. For example, suppose we wanted a list of all even numbers from a list, we could it as below.

foo = [10,13,20,19,152,1003]
filter(lambda x:x%2 == 0, foo) # We create a quick 'even number checker' on the fly.
[10, 20, 152]

A combination which gives us the squares of all even numbers less than 20 is shown below

>>> map(lambda x:x*x, filter(lambda x:x%2 == 0, range(1,20))) # range(a,b) is a primitive than returns a list of numbers from a to b
[4, 16, 36, 64, 100, 144, 196, 256, 324]

The reduce function allows us to apply a function on the first element of a list and the next and reduce it into a single one. This is repeated till the iterable reduces into a single element. The following example finds the sum of the first 10 integers.

reduce (lambda x,y:x+y, range(1,10), 0)
45

The first argument is the reduction function. The second argument is the iterable and the third is the initial value. Compare this to the procedural equivalent shown below

acc = 0 # Analogus to the third argument of reduce
for i in range(1,10): # Analogus to the second argument of reduce
  acc += i # Analogus to the first argument of reduce
print acc
45

Not only do we have an unnecessary variable acc, the thing is much larger.

Exercise: Write a function sum which will compute the sum of it's arguments.

2.2.3 List comprehensions

While these methods are fine, they can be a little hard to read if overused. Our example of squaring all even numbers was stretching it a little. Python being a language that emphasises readability has a neater notation for these things called list comprehensions.

The expressions are used to generate one list from another and mimic the mathematical set builder notation used for defining sets (set comprehensions).

A list comprehension to do our function of squaring all even numbers from 1 to 20 is shown below.

print [x*x for x in range(1,20) if x%2 == 0]
[4, 16, 36, 64, 100, 144, 196, 256, 324]

The general format is as follows (the if part at the end is optional). The transformation is the equivalent of map and the condition of if.

[ transformation for var in iterable if condition ]

As you can see, this is much more readable than the map / filter combination.

These are very commonly used in Python and you should familiarise yourself with them.

2.2.4 Higher order functions, closures and decorators

Higher order functions are programming language equivalent of the mathematical composition operation.

This is a typical functional scenario where the system is built from the bottom up (ie. create lots of small utilities and then composite them into larger structures and finally the application) as opposed to the top down methods where the whole system is broken down into pieces and then sub divided till the components are small enough to be implemented.

Let us look at some examples.

Suppose we have a bunch of functions as follows which we use to compute 2x² for all even numbers between 1 and 10.


def double(x):
   "Doubles it's argument"
   return x*2

def square(x):
   "Raises it's argument to 2"
   return x*x

def evenp(x):
   "Returns True if x is even. False otherwise"
   return x%2 == 0


print [double(square(x)) for x in range(1,10) if evenp(x)]
[8, 32, 72, 128]

Suppose we want to count the number of times these functions have been called and put these numbers into a dictionary.

One naive way is to put a global dictionary into the program and then alter each of these functions to increment a count everytime they are called. This is laborious since we have to manually modify every function which we want counted. When it's time for the counting logic to go, we have to manually remove the instrumentation.

Instead, we will define a function like so


fncounts = dict(double = 0, square = 0, evenp = 0) # the dict constructor allows us to create dictionaries
                                                   # in a cleaner way than by using { and }.


def count(fn):
   def instrumented_fn(x):
      fncounts[fn.__name__] += 1 # function.__name__ is a special variable that contains the name of the function.
                                 # This increments the corresponding member of the dictionary
      return fn(x)
   return instrumented_fn

And we instrument our functions like so

double = count(double)
square = count(square)
evenp  = count(evenp)

Now when we're done with our loop, you can see what happens.


fncounts
{'evenp': 0, 'square': 0, 'double': 0}

print [double(square(x)) for x in range(1,10) if evenp(x)]
[8, 32, 72, 128]

fncounts
{'evenp': 9, 'square': 4, 'double': 4}

Similar to these are function closures which can loosely be defined as first class functions with free variables.

Consider a simple function to calculate fibonacci numbers. We want a function fib(n) which will calculate the /n/th fibonacci number.

def fib(n):
    "Returns the nth fibonacci number"
    assert(n>0) # To make sure that we recive only numbers above 0
    if n == 1: return 0
    if n == 2: return 1
    return fib(n-2) + fib(n-1)

This is simple enough. Since it's a recursive definition, it would be interesting to know what a certain call to this function looks like. Let's write a function tracer.

def trace(fn):
    fn.indent = 0

    def traced_function(n):
        print "|  "*fn.indent + "+-- %s(%s)"%(fn.__name__,n)
        fn.indent += 1
        ret = fn(n)
        print "|  "*fn.indent + "+-- [%s]"%ret
        fn.indent -= 1
        return ret

    return traced_function

The function doesn't do much. Just keeps track of the nesting of function call and draws a text graph of the function call, it's parameters and finally, it's return value. This gives us an idea of deep the tree is

Let us instrument our fibonacci function and use it to trace fib(5)

fib = trace(fib)
fib(5)

+-- fib(5)
|  +-- fib(3)
|  |  +-- fib(1)
|  |  |  +-- [0]
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- [1]
|  +-- fib(4)
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- fib(3)
|  |  |  +-- fib(1)
|  |  |  |  +-- [0]
|  |  |  +-- fib(2)
|  |  |  |  +-- [1]
|  |  |  +-- [1]
|  |  +-- [2]
|  +-- [3]

3

This is wasteful. You can see that fib(3) is being called twice and it's whole tree of descendants as well. We already know the value of fib(3) by the time the second call is made so why can't we reuse it?

One way of doing this is to write a cached version of fib. Something like the version shown below.

cache = {}
def fib(n):
    "Returns the nth fibonacci number"
    if n in cache:
        return cache[n]
    assert(n>0) # To make sure that we recive only numbers above 0
    if n == 1: return 0
    if n == 2: return 1
    ret = fib(n-2) + fib(n-1)
    cache[n] = ret
    return ret

fib=trace(fib)
fib(5)

+-- fib(5)
|  +-- fib(3)
|  |  +-- fib(1)
|  |  |  +-- [0]
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- [1]
|  +-- fib(4)
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- fib(3)
|  |  |  +-- [1]
|  |  +-- [2]
|  +-- [3]

This solves our conundrum but we wrote a special purpose cached version of fib and introduced a global variable It would be nicer if we could have a function that would create a cached version of anything we gave it.

Let's call that function memoise. Here it is.

def memoise(fn):
    cache = {}

    def memoised_fn(x):
        if x in cache:
            return cache[x]
        else:
            ret = fn(x)
            cache[x] = ret
            return ret

    memoised_fn.__name__ = fn.__name__ # To print the original function name in any trace methods
    return memoised_fn

This is an example of a closure. The memoise function keeps some state encapsulated inside it (the cache variable). It's not visible globally but is shared across all invocations of the memoised_fn so that we can stay away from recomputing values. The memoised_fn has a free variable (ie. cache).

It also has the added advantage that the cache persists between calls so that once we compute fib(n), we will get instantly the next time.

With our original functions, here is the call tree

fib = trace(memoise(fib))
fib(5)

+-- fib(5)
|  +-- fib(3)
|  |  +-- fib(1)
|  |  |  +-- [0]
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- [1]
|  +-- fib(4)
|  |  +-- fib(2)
|  |  |  +-- [1]
|  |  +-- fib(3)
|  |  |  +-- [1]
|  |  +-- [2]
|  +-- [3]

You can see how we constructed a generic set of pieces to trace and memoise functions and used them to build up an efficient version of our original function which is unaware of all this being done to it.

This kind of usage is so common in Python that there is a special notation for it. After our 'function modifiers' are defined, we could have defined fib like so.

@trace
@memoise
def fib(n):
    "Returns the nth fibonacci number"
    assert(n>0) # To make sure that we recive only numbers above 0
    if n == 1: return 0
    if n == 2: return 1
    return fib(n-2) + fib(n-1)

The @ operation is called decoration and trace and memoise are called decorators. The above is equivalent to first defining fib and then using a fib = trace(memoise(fib)) statement

Exercise: Write a a decorator to profile functions (ie. print out how much time they take). You can use the standard datetime module to measure time.

2.2.5 Argument list unpacking

In python you can unpack a list or a dictionary and provide them as positional or keyword arguments to a function.

In the following example, we have a function greet that expects 3 arguments viz. name, city and gender. It will, based on the inputs generate and appropriate greeting message. Our inputs are read out from a file and we obtain them as strings of the form, "name : city : gender".

Here is a program to create greetings for the people.

def greet(name, city, sex):
    """Returns a greeting designed for the person whose details are
    provided"""

    pronoun = dict(male = "him",
                   female = "her")[sex]

    return "Presenting %s of %s! We welcome %s to our fair city."%(name.capitalize(), city.capitalize(), pronoun)


input = ["vladimir : moscow : male",
         "fathima  : cairo : female",
         "john:London: male"]


for i in input:
    print greet(*[x.strip() for x in i.split(":")])

'Presenting Vladimir of Moscow! We welcome him to our fair city.'
'Presenting Fathima of Cairo! We welcome her to our fair city.'
'Presenting John of London! We welcome him to our fair city.'

The line greet(*[x.strip() for x in i.split(":")]) splits the input string on the : character and then strips off the extra spaces which might be there. It then uses the * prefix to expand the list into positional arguments for the greet function.

The situation is similar for dictionaries with keyword arguments.

The important thing to be understood here is that using this mechanism, it is possible to dynamically construct argument lists to a function.

When coupled with functions that can accept an arbitrary number of parameters, a lot of interesting possibilities arise

2.2.6 Generators and generator expressions

Most functions return a single value. Sometimes however, a function needs to return a series of values. This is often handled by running the function and creating a list which is then returned. This is okay but very often, we might not need the whole list. We might search for the first element that satisfies a condition and stop looking ahead after that. The computation of the whole list was unnecessary and wasteful.

A naive way to solve this would be write a completely different version of the function which computes numbers one by one and returns the value when a certain condition is matched. This is not very nice since our conditions might change and the function shouldn't have to know about what we're doing with it's return value.

Python solves this problem using the concept of generators which are functions that can be suspended in mid-execution and resumed later.

Imagine a log file which contains multiple lines of the following format.

time : user : number of MB transferred

Suppose I want to sum the amount of data transferred.

The first would be accomplished by a function like this

def calc_total(logfile):
    "Returns total data transferred"
    fp = open(logfile)
    total = 0
    for i in fp:
        date, user, amount = [x.strip() for x in i.split(":")]
        total += int(amount)

    fp.close()
    return total

Suppose I want to return the first person who transferred more than 1000 MB.

def get_leecher(logfile):
    "Return first person who transferred more than 1000 MB"
    fp = open(logfile)
    for i in fp:
        date, user, amount = [x.strip() for x in i.split(":")]
        if amount > 1000:
            fp.close()
            return user

These two are special purpose and both of them need the list of records in the file. Let's try to abstract that out with a parse_log function.

def parse_log(logfile):
    "Returns a list of records in the logfile as a (date, user, transfer) tuple"
    fp = open(logfile)
    records = []
    for i in logfile:
        date,user,data = [x.strip() for x in i.split(":")]
        records.append([int(date), user, int(data)])
    return records

Our functions would then, instead of parsing the files themselves say for i in parse_log(logfile): and use the records directly.

Suppose our file had 1000000 records. The above function is fine for total counting since you need all the records anyway. For the second however, if our first leecher was the 250th person, it would have been a waste to find the rest of the people.

Instead of generating the whole list and returning it, we can alter parse_log to work as a generator instead of a regular function. This is done using the yield keyword instead of return. The function would look like this

def parse_log(logfile):
    "Returns a list of records in the logfile as a (date, user, transfer) tuple"
    fp = open(logfile)
    for i in fp:
        date,user,data = [x.strip() for x in i.split(":")]
        yield [int(date), user, int(data)]

If try to print the return value of the generator, the inteepreter will tell us that it's a generator rather than a list. Generators have a .next method which returns the next value until, when there are no more elements to produce raises a StopIteration exception. We shall use these contents in a file called foo.log' to illustrate

100 : umar : 2345
120 : ali : 500
150 : zaid : 1024
170 : logan : 543
200 : scott : 213

records = parse_log("foo.log")

print records
<generator object parse_log at 0x14148c0>

sum([x[2] for x in records], 0)  # Sum is a primitive function that
                                 # adds all the items in it's first
                                 # argument and uses it's second
                                 # argument as an initial value
4625

# The generator has now been 'used up' and needs to be reinitialised
# if we want to use it again.

records = parse_log("foo.log")
for date,name,data in records:  # Lines are read out from the file on demand
   if data > 1000:
      print name

Generators are very commonly used in Python and therefore, a shorthand way of creating them exists. A list comprehension but bracketed using ( and ) instead of [ and ] creates a generator instead of a proper list. Our generator therefore could be written as follows. It's slightly different from our original version. Can you say how?

([x.strip() for x in y.split(":")] for y in open(logfile))

Exercise: Write a generator cycle that when given any iterable will return values from the iterable over and over again forever. eg. If I say cycle([1,2,3]), it should return a generator that will produce values like this 1,2,3,1,2,3,1,2,3...

2.3 Object oriented programming

2.3.1 Classes, object, methods (bound and unbound)

Python fully supports object oriented programming but in a way that's a lot simpler than other languages like C++.

As in most cases, the simplest way to start is to take an example. Let us create a class to handle complex numbers.

class Complex(object):
    "Complex number class. Version 1"
    def __init__(self,real=0,imag=0):  #__init__ is the constructor method.
        self.real = float(real)
        self.imag = float(imag)

    def display(self):
        sign = self.imag > 0 and "+" or "-"
        print "%s %s %sj"%(self.real,
                           sign,
                           abs(self.imag))

t = Complex(3,4)

t.display()
3.0 + 4.0j

t = Complex(3,-2)

t.display()
3.0 - 2.0j

Small as it is, this example needs some explanation. Classes are defined using the class keyword. Objects are instances of classes (eg. 3+4j is an object of the complex class). The general syntax is as follows.

    class classname (base class 0,base class 1,...):

We call our class Complex. The object inside the brackets asks Python to inherit this class from the basic object class so that the hierarchy is maintained. Skipping this will make your class an old style class support for which has been dropped from Python 3.0 onwards. It's a legacy feature which you should not use anymore. Details of the differences between old and new style classes is beyond the scope of this document but those interested in the details can visit http://www.python.org/doc/newstyle/.

Like functions, classes too can have a docstring.

Functions defined inside the class are class methods. In Python unlike C++, all methods are public. Python eschews the need to class to hide it's innards and adopts a 'we are all consenting adults' outlook. While this may seem strange and dangerous to C++ or Java programmers, the gains this kind of construct gives Python (especially for introspection and documentation) are really great.

A point to note is that in Python, all class methods will receive an extra first argument which holds the object through which the method was called. This is analogus to the this pointer in C++. This first argument is conventionally called self. This is not a language rule but an almost uniform convention and you would do well to abide by it. Accessing object level members is done through the self pointer. This 'extra' first argument is the only real difference between a class method and a regular function.

Special methods (eg. constructors etc.) have a trailing and leading __. We will see other special methods later in this tutorial. __init__ refers to the constructor and it is the method called when a class is created.

The display method which we created allows us to output a printable version of the class so that we know what's happening.

This is all nice but it's quite useless to have just a class that can print itself. Let's make it do something useful like perform simple arithmetic (addition and subtraction).

class Complex(object):
    "Complex number class. Version 2 (with addition and subtraction)"
    def __init__(self,real=0,imag=0):  #__init__ is the constructor method.
        self.real = float(real)
        self.imag = float(imag)

    def display(self):
        sign = self.imag > 0 and "+" or "-"
        print "%s %s %sj"%(self.real,
                           sign,
                           abs(self.imag))

    def __add__(self,addend):  # Special method equivalent to add
        return Complex(self.real + addend.real,
                       self.imag + addend.imag)


    def __sub__(self,subtractend):  # Special method equivalent to subtract
        return Complex(self.real - subtractend.real,
                       self.imag - subtractend.imag)


c1 = Complex(5,6)

c2 = Complex(1,2)

s = c1 + c2

d = c1 - c2

s.display()
6.0 + 8.0j

d.display()
4.0 + 4.0j

Now we have addition and subtraction. This is the way Python does operator overloading. In fact When you do

x+y

what's internally happening is a function call.

x.__add__(y)

So you can see that this kind of operation is intuitive. In fact, all the things that can be done to an object are all implemented using such special methods.

Adding the following two methods to our class would give us nice string (str) and programmer (repr) representations.

    def __str__(self):
        sign = self.imag > 0 and "+" or "-"
        return "%s %s %sj"%(self.real,
                           sign,
                           abs(self.imag))

    def __repr__(self):
        return "Complex (%s, %s)"%(self.real, self.imag)


t=Complex(4,5)

print t    # Uses str(t)
4.0 + 5.0j

t          # Uses repr(t)
Complex (4.0, 5.0)

The string represenation is what you get when you try to convert the object into a string using the str function. This is what print does internally.

The repr function converts objects into a representation that useful for programmers to undertand what it contains. Often, it's in a format which can be cut/pasted back into the interpreter.

For the complete list of such special methods, please refer http://www.python.org/doc/2.6/reference/datamodel.html#special-method-names

This concludes our introduction to classes with a note that this class is quite useless in real life since Python has an inbuilt complex primitive type.

Exercise: Extend the complex class to accomodate multiplication and modulus.

Exercise: Implement a vector class that supports addition, subtraction, cross products and dot products.

2.3.2 Inheritance

Let us consider a class that models a storage device. We'll assume that it has the following attributes

capacity
speed

We'll also assume that it has the following methods

write
read
sync

Now imagine two types of storage devices. A network storage device and a physical disk. They should all implement these basic features. The network device will also have an extra method to connect to the remote device. We can model it as follows.


                                             +-------------------+
                                             |                   |
                                             +-------------------+
                                             |  StorageDevice    |
                                             +-------------------+
                                              /                  \
                                             /                    \
                                            /                      \
                             +----------------+                  +----------------+
                             |                |                  |                |
                             |                |                  |                |
                             +----------------+                  +----------------+
                             | NetworkDevice  |                  | LocalDevice    |
                             +----------------+                  +----------------+

Now we can write classes for these 3 blocks like so

class StorageDevice(object):
    "Provides a base class for StorageDevices."
    def __init__(self,capacity,speed):
        self.capacity = capacity
        self.speed    = speed

    def __repr__(self):
        return "%s(capacity = %s, speed = %s)"%(self.__class__.__name__, self.capacity, self.speed)

    def write(self,data):  # This makes this class an abstract base class. This function can't be called. 
        raise NotImplementedError("Can't write to an abstract device")

    def read(self):
        raise NotImplementedError("Can't read from an abstract device")

    def sync(self):
        raise NotImplementedError("Can't sync an abstract device")


class NetworkDevice(StorageDevice): # Notice the base class
    "Implements a NetworkDevice"
    def __init__(self,capacity,speed):
        super(NetworkDevice,self).__init__(capacity,speed) # We'll discuss this. It's basically calling the base class constructor.

    def write(self, data):
        print "Wrote '%s' to the network device"%data

    def read(self):
        return "some data"

    def sync(self):
        print "Syncing disks"
        return True  # Boolean True. It's a builtin constant

    def ping(self):
        "Pings network server"
        print "Pinging remote server"
        return True

class LocalDevice(StorageDevice): # Notice the base class
    "Implements a LocalDevice"
    def __init__(self,capacity,speed):
        super(LocalDevice,self).__init__(capacity,speed)

    def write(self, data):
        print "Wrote '%s' to the local device"%data

    def read(self):
        return "some local data"

    def sync(self):
        print "Syncing local disks"
        return True  # Boolean True. It's a builtin constant
        
        
t=StorageDevice(12,12)

t.write(312)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)

/home/noufal/<ipython console> in <module>()

/home/noufal/<string> in write(self=StorageDevice(capacity = 12, speed = 12), data=312)

NotImplementedError: Cant write to an abstract device

t=NetworkDevice(12,32)
s=LocalDevice(12,34)


for i in [t,s]:
  i.write("Hello")
  i.sync()
  
Wrote 'Hello' to the network device
Syncing disks
Wrote 'Hello' to the local device
Syncing local disks

print repr(t), repr(s)
NetworkDevice(capacity = 12, speed = 32) LocalDevice(capacity = 12, speed = 34)

Let's go over the things that are new here. First of all, the base class is no longer object for the LocalDevice and NetworkDevice classes. We inherit from StorageDevice.

We raise a NotImplementedError exception in the methods of the base class so that the class can't be used as is. We tried to use it and got a traceback.

In in the constructor method, we call the super builtin. It's function is to return a proxy that delegates a method call back to the base class. Since it does the delegation at runtime, it's possible to create interesting inheritance patterns (diamond etc.) in Python.

Finally, you can see that in the loop, we don't care whether i is a NetworkDevice or a StorageDevice (or a book or something else that implements write). We use duck typing and just call the method we're interested in.

2.3.3 Properties

Its common practice to praise OOP for encapsulation ie. how it prevents you from accessing internal state of an object.

C++ ensures this by erecting a steel wall around the objects private members only to have the novice programmer expose them using methods like setUserName and getUserName. These are called getters and setters and Python eschews them. Instead, we use properties. We should carefully consider what we expose and not how.

Essentially, a property is a way of altering the semantics of an attribute access. It allows us to replace the access of an attribute (say x) using the . operator with a function call but without changing any of the calling code. x is then called a managed attribute. This is done using the property keyword.

Let's create a simple class that represents a person. It has an age attribute that must be managed (ie. cannot be negative).

We do it like this.

class AgeException(Exception):
    pass
    
class Person(object):
    def setAge(self,value):
        if value > 0:
            self.__age = value
        else:
            raise AgeException("Age should be greater than 0")

    def getAge(self):
        return self.__age
    
    age = property(getAge, setAge, doc = "Age of the person")

    

t=Person()

t.age = 10

print t.age
10

t.age = 0
---------------------------------------------------------------------------
AgeException                              Traceback (most recent call last)

/home/noufal/notes/<ipython console> in <module>()

/home/noufal/notes/<string> in setAge(self=<__main__.Person object at 0x99677cc>, value=0)

AgeException: Age should be greater than 0

print t.age
10

The property builtin accepts 4 arguments in this order. The getter, the setter, the deleter and the docstring for the attribute. If any of these are left blank, that operation is deemed illegal.

2.3.4 Advanced Exception handling

In python, exceptions are also classes and they have a hierarchy. Behold!


    Exception(*)
     |
     +-- SystemExit
     +-- StandardError(*)
          |
          +-- KeyboardInterrupt
          +-- ImportError
          +-- EnvironmentError(*)
          |    |
          |    +-- IOError
          |    +-- OSError(*)
          |
          +-- EOFError
          +-- RuntimeError
          |    |
          |    +-- NotImplementedError(*)
          |
          +-- NameError
          +-- AttributeError
          +-- SyntaxError
          +-- TypeError
          +-- AssertionError
          +-- LookupError(*)
          |    |
          |    +-- IndexError
          |    +-- KeyError
          |
          +-- ArithmeticError(*)
          |    |
          |    +-- OverflowError
          |    +-- ZeroDivisionError
          |    +-- FloatingPointError
          |
          +-- ValueError
          +-- SystemError
          +-- MemoryError

If we have an except statement that catches an exception of type X, we will also catch all exceptions whose base class is X. Therefore, if we catch LookupError, we will also catch KeyError and IndexError.

If we want to create a custom exception, we need to inherit it from one of the standard expceptions.

3 The standard library

In this section, we'll have a quick overview of a couple of standard modules.

The treatment is not meant to be comprehensive but to give you a birds eye view of the richness of the standard library so that you know what you have before you rush off to reimplement the wheel.

Links are there following each subsection where you can find the official documentation.

These modules are used by import ing them into your program and using the functions, classes and other attributes they provide.

If you want to know all the attributes of a given object (including those of a module), you can use the dir function. Most of the attributes have documentation so if you want, you can write a quick documentation grabber like so.

def gen_doc(obj):
    "Rips out all documentation for a module and prints it out neatly"
    for i in dir(obj):
        attrib = getattr(obj,i)
        # We're not interested in the docs of integers and strings
        if not isinstance(attrib,str) and not isinstance(attrib,int): 
            print i,"(",type(attrib),"):"
            print "-" * (len(i) +len(str(type(attrib))) + 3),"\n"
            doc = attrib.__doc__
            # If documentation is available, print it. 
            if doc:
                # Truncate docs if they're too long
                if len(doc) < 90:
                    print "   ",doc
                else:
                    print "   ",doc[0:90]," [truncated...]"
            print "="*80

This introduces the getattr function which is used to access attributes of object of which you know the names (ie. the name is in a string). In other words,

t=Person()
print t.age

is the same as

attribute = "age"
t=Person()
print getattr(t,attribute)

It also introduces the isinstance function which is used to test if a certain object is of a certain type. It handles inheritance properly. ie. a subclass is of the same type as the base class.

For each of these modules, you should look at the documentation on the offical python page (http://docs.python.org/modindex.html) as well as the PyMOTW (Python Module Of The Week Page) for the module by Doug Hellmann the index of which is at http://www.doughellmann.com/PyMOTW/contents.html

Let's get started.

3.1 sys : System specific parameters

The sys module contains things which affect the interpreter operation. Here are some of them with descriptions.

Attribute	Description
sys.argv	Argument list (similar to C's argv)
sys.path	List of paths which the interpreter will look for modules to import
sys.exit()	Quits the interpreter
sys.exitfunc	Function to call upon exit (useful for cleanup)
sys.ps1, sys.ps2	Interpreter prompts
sys.stdin, sys.stdout, sys.stderr	Standard input, output and error file descriptors
sys.version	Python version

import sys
print sys.path
['', '/usr/lib/python2.5/site-packages/SQLAlchemy-0.5.1-py2.5.egg',
'/usr/lib/python2.5/site-packages/rope-0.2pre5-py2.5.egg',
'/usr/lib/python2.5/site-packages/RescueTimeUploader-0.0.0-py2.5.egg',
'/usr/lib/python2.5', '/usr/lib/python2.5/plat-linux2',
'/usr/lib/python2.5/lib-tk', '/usr/lib/python2.5/lib-dynload',
'/usr/local/lib/python2.5/site-packages',
'/usr/lib/python2.5/site-packages',
'/usr/lib/python2.5/site-packages/Numeric',
'/usr/lib/python2.5/site-packages/PIL',
'/usr/lib/python2.5/site-packages/gst-0.10',
'/var/lib/python-support/python2.5',
'/usr/lib/python2.5/site-packages/gtk-2.0',
'/var/lib/python-support/python2.5/gtk-2.0', '/usr/lib/site-python']

print sys.version
2.6+ (r26:66714, Oct 22 2008, 09:25:02) 
[GCC 4.3.2]

sys.stderr.write("I am on stderr\n")
I am on stderr

3.2 os : Operating system services

The os module has functions and methods which provide cross platform access to operating system details. A few of the methods are described below

Attribute	Description
os.chdir()	Change working directory
os.getlogin()	Get login name of user who owns the current controlling terminal
os.getpid()	Get current process id
os.environ	A dictionary containing the environment variables
os.access()	Used to check if the current process can access a path
os.chroot()	Issue a chroot system call
os.stat()	Perform a `stat` on a path
os.symlink()	Creates a symlink to a file
os.execl()	Exec a program replacing the current process
os.spawnl()	Spawn a program in another process
os.system()	Execute system command (deprecated by the `subprocess` module)

The functions along with the subprocess module are commonly used for glue applications.

import os
os.getlogin()
'noufal'

os.getcwd()
'/home/noufal/notes'

os.chroot("/etc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 1] Operation not permitted: '/etc'

os.stat("/etc/passwd")
posix.stat_result(st_mode=33188, st_ino=609499L, st_dev=2058L, st_nlink=1, st_uid=0, st_gid=0, st_size=1921L, st_atime=1241026999, st_mtime=1241026835, st_ctime=1241026835)

os.system("ls /boot")
abi-2.6.27-11-generic     config-2.6.27-7-generic       initrd.img-2.6.27-9-generic   System.map-2.6.27-9-generic   vmlinuz-2.6.27-14-generic
abi-2.6.27-14-generic     config-2.6.27-9-generic       lost+found                    vmcoreinfo-2.6.27-11-generic  vmlinuz-2.6.27-7-generic
abi-2.6.27-7-generic      grub                          memtest86+.bin                vmcoreinfo-2.6.27-14-generic  vmlinuz-2.6.27-9-generic
abi-2.6.27-9-generic      initrd.img-2.6.27-11-generic  System.map-2.6.27-11-generic  vmcoreinfo-2.6.27-7-generic
config-2.6.27-11-generic  initrd.img-2.6.27-14-generic  System.map-2.6.27-14-generic  vmcoreinfo-2.6.27-9-generic
config-2.6.27-14-generic  initrd.img-2.6.27-7-generic   System.map-2.6.27-7-generic   vmlinuz-2.6.27-11-generic
0

os.environ['LOGNAME']
'noufal'

3.3 operator : Functional equivalents of operators

The operator module exposes Python operators as functions. So, a statement like

x,y = 5,10

print x < y
True

could be written as

import operator
x,y = 5,10

operator.lt(x,y) # operator.lt is the less than operator
True

This is useful to construct conditions on the fly at runtime.

3.4 re : Regular expressions

The re module provides regular expression support which we can use to parse textual data a'la Perl.

The module mainly makes available the compile function which can be used to create compiled regular expressions.

An example is shown below.

scan_text = """Once upon a midnight dreary, while I pondered, weak and weary,
Over many a quaint and curious volume of forgotten lore--
While I nodded, nearly napping, suddenly there came a tapping,
As of some one gently rapping, rapping at my chamber door. """

rexp = re.compile(r".*,\s*(\S+)\s+(\S+)\s*,.*") # rexp is a regular expression object now.
                                                # Tries to find two words bracketed by commas.

s = rexp.search(scan_text)  # s is a search result
print s.groups()  # Prints the groups matched by the brackets in the regexp
('nearly', 'napping')

3.5 Datetime : Handling dates

Python's datetime module allows us to handle dates and times in the Python object space without doing string parsing. A simple example where create a datetime and advance it by 2 weeks.

import datetime
now = datetime.datetime.now()
two_weeks = datetime.timedelta(weeks=2) # A timedelta is used to calculate temporal distance.

t = now + two_weeks
print t.strftime("%d %B %Y")

17 June 2009 #Your output will vary on when you run this program.

3.6 Logging : A logging framework

The logging module allows us to create a robust system where information about the program is logged during it's runtime.

import logging

logging.basicConfig(stream = sys.stdout, format = "%(levelname)s : %(asctime)s : %(message)s", level = logging.WARNING)

logging.debug("Hello?") # Doesn't appear since we fixed it to WARNING and above

logging.info("Hello?") # Doesn't appear since we fixed it to WARNING and above

logging.warning("Hello?")
WARNING : 2009-06-03 01:45:20,344 : Hello?

logging.critical("Hello?")
CRITICAL : 2009-06-03 01:45:23,090 : Hello?

This example oversimplifies the module. In reality, we'd make a logger that has multiple handlers (eg. Debug only to file, warnings and above to a separate file. info and above to screen and file etc.)

Refer the complete docs for details.

3.7 Unittest : Xunit style unit testing

Unittest is a module that allows us to write tests for our program. It's based on the Xunit framework designed by Kent Beck.

Let us assume we have the program shown below

# average.py
class AverageError(Exception):
   pass

def average(*nos):
   if not nos: 
      raise AverageError("Nothing to average")
   if not all([isinstance(x,int) or isinstance(x,float) for x in nos]):
      raise AverageError("Can only average ints or floats")

   return float(sum(nos,0))/len(nos)

It returns the average a list of numbers we give it and has some basic error checking capability

The tests for the module can be written as below

# average_tests.py
import unittest

import average

class AverageTest(unittest.TestCase):
    def testNullInput(self):
        "Tests if the function raises AverageError on no inputs"
        self.assertRaises(average.AverageError, average.average)n

    def testStringInput(self):
        "Tests if the function raises AverageError on bad (String) inputs"
        self.assertRaises(average.AverageError, average.average, "hello", 1, 2, 3)

    def testAverageComputation(self):
        "Tests if the computations are correct"
        avg_from_lib = average.average(2,3,4,5)
        true_average = float(2 + 3 + 4 + 5 )/ 4
        self.assertEqual(avg_from_lib, true_average)

if __name__ == "__main__":
    unittest.main()

We save these two into the same directory and run the tests like so to see this

%python average_tests.py -v
Tests if the computations are correct ... ok
Tests if the function raises AverageError on no inputs ... ok
Tests if the function raises AverageError on bad (String) inputs ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.002s
OK

A test case by definition is a class that has been derived from unittest.TestCase. In it, we can define methods that start with the letters test. Each one will be executed and then one of the validator methods (methods that check for a condition or lack of it) should be called. As you can see, the function behaved as we expected so the results are pass. If we corrupt something (eg. let's drop the if not nos condition), we'd get an error and the test would fail like so. The -v flag makes it print out verbose details.

% python average_tests.py -v
Tests if the computations are correct ... ok
Tests if the function raises AverageError on no inputs ... ERROR
Tests if the function raises AverageError on bad (String) inputs ... ok

======================================================================
ERROR: Tests if the function raises AverageError on no inputs
----------------------------------------------------------------------
Traceback (most recent call last):
  File "average_tests.py", line 9, in testNullInput
    self.assertRaises(average.AverageError, average.average)
  File "/usr/lib/python2.5/unittest.py", line 320, in failUnlessRaises
    callableObj(*args, **kwargs)
  File "/tmp/average.py", line 8, in average
    return float(sum(nos,0))/len(nos)
ZeroDivisionError: float division

----------------------------------------------------------------------
Ran 3 tests in 0.014s

Exercise: Write some unit tests for the fib, trace and memoise functions we wrote earlier.

3.8 Itertools : Iterator construction

Itertools offers us a range of generator constructors. These are useful to construct interesting generators based on existing iterables.

Here are some simple examples. To see more powerful examples, please visit the official documentation pages.

itertools.cycle allows us to create an infinitely looping iterator from a finite list.

c = [1,2,3]

import itertools

c_cycle = itertools.cycle(c)

c_cycle.next()
1

c_cycle.next()
2

c_cycle.next()
3

c_cycle.next()
1

c_cycle.next()
2

c_cycle.next()
3

c_cycle.next()
1

itertools.chain allows us to tie together multiple iterators into a single one. Here's an example of flattening a list of lists using it.

c= [[1,2,3],[4,5,6],[7,8,9]]
list(itertools.chain(*c))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Since these are generators, they are evaluated lazily and usually a good choice when you have to loop over things.

Please refer to the standard documentation for more useful examples.

3.9 Others : Other useful and commonly used modules

optparse - Offers an easy way to create an option parser for your program.
timeit - Useful to time snippets of python code to test for performance
subprocess - A general purpose module to run external programs. Useful in glue scenarios.

Self defence against large snakes - A python tutorial

Table of Contents