Python

From Ggl's wiki

Jump to: navigation, search

Some selected pieces from python.org PEPS.

Contents

The Zen of Python

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.
  • Readability counts.
  • Special cases aren't special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one-- and preferably only one --obvious way to do it.
  • Although that way may not be obvious at first unless you're Dutch.
  • Now is better than never.
  • Although never is often better than *right* now.
  • If the implementation is hard to explain, it's a bad idea.
  • If the implementation is easy to explain, it may be a good idea.
  • Namespaces are one honking great idea -- let's do more of those!

Style Guide

Summary of Style Guide for Python Code

Code is read much more often that it is writeen.

  • Indentation: 4 spaces per indentation levels (never mix tabs and spaces, and prefer spaces only).
  • max lines length = 79 lines, and 72 lines for flowing long blocks of text (docstrings or comments). Break a line with implied line continuation inside parentheses or '\'
  • Separate top-level function and class definitions with two blank lines
  • Method definitions inside a class are separated by a single blank line
  • Python 2.x -> Latin-1 (ISO-8859-1) encoding. Python 3.0 -> UTF-8
  • Imports should be on separate lines
  • Imports are always put at the top of the file in the following order:
  1. Standard library imports
  2. Related third party imports
  3. Local application/library specific imports
  • Each group of imports is separated by a blank line
  • spaces:
spam(ham[1], {eggs: 2})
if x == 4: print x, y; y = y, x
  • no spaces around the '=' sign when used to indicate a keyword argument or a default parameter value:
def complex(real, imag=0.0)
  return magic(r=real, i=imag)
  • Write docstrings for all public modules, functions, classes, and methods. Comment should apear after the "def" line.

Naming Conventions

  • _single_leading_underscore: weak "internal use" indicator ("from M import *" does not import objects whose name starts with an underscore).
  • single_trailing_underscore_: used by convention to avoid conflicts with Python keyword
  • __double_lkeading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo)
  • __double_leading_and_traling_underscore__: "magic" objects or attributes that live in user-controlled namespaces. Never invent such names! Only use them as documented.
  • Class: CapWords convention
  • Exception: class naming convention with the suffix "Error"
  • Global Variable: __var__
  • Function: lowercase with words separated by underscores.
  • Function and method arguments:
    • Always use 'self' for the first argument to instance methods.
    • Always use 'cls' for the first argument to class methods.

Programming recommendations

  • Code should be written in a way that does not disadvantage other implementations of Python. Example: use .join() instead of a += b
  • Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators.
  • Use class-based exceptions
  • When raising an exception, use "raise ValueError('message')" insted of the older form "raise Valueerror, 'message'"
  • Wen catching exceptions, mention specific exceptions whenever possible instead of using a bare 'except:' clause.
  • For all try/except clauses, limit the 'try' clause to the absolute minimum amount of code necessary. This avoid masking bugs.
  • Use string methods instead of the string module
  • Use .startswith() and endswith() insted of string slicing to check for prefixes or suffixes.
Yes: if foo.startswith('bar'):
No: if foo[:3] == 'bar':
  • Object type comparisons should always use isinstance() instead of comparing types directly.
Yes: if isinstance(obj, int):
No: if type(obj) is type(1):
  • For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
  • Don't compare boolean values to True or False using '=='

Optimization

Excerpt from Python Patterns - An Optimization Anecdote

If you feel the need for speed, go for built-in functions - you can't beat a loop written in C. Check the library manual for a built-in function that does what you want. If there isn't one, here are some guidelines for loop optimization:

  • Rule number one: only optimize when there is a proven speed bottleneck. Only optimize the innermost loop. (This rule is independent of Python, but it doesn't hurt repeating it, since it can save a lot of work. :-)
  • Small is beautiful. Given Python's hefty charges for bytecode instructions and variable look-up, it rarely pays off to add extra tests to save a little bit of work.
  • Use intrinsic operations. An implied loop in map() is faster than an explicit for loop; a while loop with an explicit loop counter is even slower.
  • Avoid calling functions written in Python in your inner loop. This includes lambdas. In-lining the inner loop can save a lot of time.
  • Local variables are faster than globals; if you use a global constant in a loop, copy it to a local variable before the loop. And in Python, function names (global or built-in) are also global constants!
  • Try to use map(), filter() or reduce() to replace an explicit for loop, but only if you can use a built-in function: map with a built-in function beats for loop, but a for loop with in-line code beats map with a lambda function!
  • Check your algorithms for quadratic behavior. But notice that a more complex algorithm only pays off for large N - for small N, the complexity doesn't pay off. In our case, 256 turned out to be small enough that the simpler version was still a tad faster. Your mileage may vary - this is worth investigating.
  • And last but not least: collect data. Python's excellent profile module can quickly show the bottleneck in your code. if you're considering different versions of an algorithm, test it in a tight loop using the time.clock() function.

Tools

Syntax and Style

PyLint, PyFlakes, PyChecker

Profiling

Notes

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.

It provides a full suite of well known enterprise-level persistence patterns, designed for efficient and high-performing database access, adapted into a simple and Pythonic domain language.

IPython offers a combination of convenient shell features, special commands and a history mechanism for both input (command history) and output (results caching, similar to Mathematica). It is intended to be a fully compatible replacement for the standard Python interpreter, while offering vastly improved functionality and flexibility.

Simple Examples

Is x a power of 2?

  • What is a power of 2? The most convenient representation is in binary. A power of two number has only one bit set.
  • How to find that only one bit is set?

(1)

def is_pow2(x):
    def count_setbits(x)
        count = 0
        offset = 0
        while (1 << offset) <= x:
            if (1 << offset) & x:
                count += 1
            offset += 1
        return count
    return count_setbits(x) == 1

Which suboptimal because we don't need to count all the set bits in x. We only need to know if a single bit is set.

(2)

def is_pow2(x):
    count = 0
    offset = 0
    while (1 << offset) <= x:
        if (1 << offset) & x:
            if count == 1:
                return False
            count = 1
        offset += 1
    return True

We could also provide a generator:

(3)

def count_setbits(x):
    count = 0
    offset = 0
    while (1 << offset) <= x:
        if (1 << offset) & x:
            count += 1
            yield count
        offset += 1

def is_pow2(x):
    for i in count_setbits(x):
        if i > 1:
            return False
    return True

We can improve (2) a bit. We are only interested by a bit. We can directly shift x:

(4)

def is_pow2(x):
    count = 0
    while x:
        if x & 1:
            if count == 1:
                return False
            count = 1
        x = x >> 1
    return True

Introspection

Call a function from a module by its name:

func = getattr(__import__(__name__), "check_%s" % typekind.lower())

As __import__(name) looks for name in sys.modules (which is a dictionnary), sys.modules[name] can also be used.

Dynamically instantiate a class according to its name:

obj = getattr(__import__(__name__), cls.__name__)()


Function Call

Section Compound Statements: Function Definition tells:

A function definition is an executable statement. Its execution binds the function name in the current local namespace to a function object (a wrapper around the executable code for the function).

The function definition is executed and bound to an object. The function body will be executed when it will be called.

It also explains that:

Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended.
def f(xs=[]):
    y = xs and xs[0] or 0 
    xs.append(y+1)
    return xs

Which gives on consecutive calls:

>>> f()
[1]
>>> f()
[1, 2]
>>> f()
[1, 2, 2]

The named default parameter is converted to a positional argument tuple:

>>> f.func_defaults
([1, 2, 2],)

It is a local variable of the function:

>>> f.func_code.co_varnames
('xs', 'y')
>>> f.func_code.co_nlocals
2

Constants are stored in a tuple:

>>> f.func_code.co_consts
(None, 0, 1)

A code block defines a local scope. A code block is a module, a function body, or a class definition. Then if a variable is defined inside a for loop in a function it is available in the function:

def f(xs):
    for x in xs:
        y = x
        print x
    print 'y = %s' % y

Which gives:

>>> f([1,2,3,4])
1
2
3
4
y = 4

That's simly because the for loop does not define a block y is in the function's scope.

Add a method to an instance

Define a function that takes an instance as first argument:

def function(obj, x)
    return obj.x - x

Add the function as a method to an instance:

instance.__dict__['dummy_method'] = types.MethodType(function, instance)

Here function takes an instance as its first argument. However if it did not, we should provide a first free variable in order to bind it to self when the method will be called from the instance.

def square(x)
    return x*x

instance.__dict__['dummy_method'] = types.MethodType(lambda obj, x: function(x), instance)

In some case, we might define a wrapper function.

Functional Programming

Builtin type set provides useful overloaded operators:

>>> l1 = [1, 2, 4, 6, None]
>>> l2 = [2, 4, 3, 7]
>>> set(l1) & set(l2)
set([2, 4])
>>> set(l1) | set(l2)
set([None, 1, 2, 3, 4, 6, 7])
>>> set(l1) ^ set(l2)
set([None, 1, 3, 6, 7])

Notice that I define to list and convert them to sets using the builtin set() constructor. Now I'd like to remove this step. Then I define a function to_set():

>>> to_set = lambda x,y,f: f(set(x),set(y))

It applies f to x and y, converted to sets. Let bind it to a intersection of sets. I bind the & operator to f parameter. Python does not support partial natively support partial evaluation. Just import the module functools to get the function partial, that binds values to named parameters. Operators in python are not functions. Import the module operator to get functions for &, | and ^ (respectively and_, or_ and xor).

>>> from functools import partial
>>> import operator
>>> inter = partial(to_set, operator.and_)
>>> inter(l1, l2)

Now you can intersect two lists without explicitly converting it to sets. Do the same with the remaining operators:

>>> union = partial(to_set, f=operator.or_)
>>> union(l1, l2)
set([None, 1, 2, 3, 4, 6, 7])
>>> xor = partial(to_set, f=operator.xor)
>>> xor(l1, l2)
set([None, 1, 3, 6, 7])

However you see that it returns a set which is not really consistent with the two lists as arguments. Using a lambda we can provide a more elegant solution:

>>> inter = lambda x, y: list(to_set(x, y, f=operator.and_))
>>> inter(l1, l2)
[2, 4]
>>> union = lambda x, y: list(to_set(x, y, f=operator.or_))
>>> xor = lambda x, y: list(to_set(x, y, f=operator.xor))

And finally, let return a tuple (intersection, union, xor):

>>> [f(l1,l2) for f in (inter, union, xor)]
[[2, 4], [None, 1, 2, 3, 4, 6, 7], [None, 1, 3, 6, 7]]

I used operators to introduce the module operator but I could have directly assigned set methods intersection, union and symetric_difference.

One way to do this use intersection as an instance method:

>>> to_set = lambda x,y,m: getattr(set(x), m)(set(y))
>>> inter = lambda x, y: list(to_set(x, y, m='intersection'))
>>> inter(l1,l2)
[2, 4]

Another way use intersection as an class method:

>>> to_set = lambda x,y,f: f(set(x),set(y))
>>> inter = lambda x, y: list(to_set(x, y, f=set.intersection))
>>> inter(l1,l2)
[2, 4]

set.difference returns the new elements in the second set:

>>> diff = lambda x, y: list(to_set(x, y, f=set.difference))
>>> diff(l1,l2)
[None, 1, 6]

Now we provide a function that returns a tuple (added, removed, same):

>>> [f(x,y) for ((x,y),f) in [((l1,l2), diff), ((l2,l1), diff), ((l1,l2), inter)]]
[[None, 1, 6], [3, 7], [2, 4]]

For immutale iterable you may use frozenset.

All these operations can also be implemented in lists comprehensions which applies to any iterable.

Recursive merging of two dicts

Basically dicts are used to represent tree-like structures. Imagine you want to merge {{{ {'a': {'b': 1, 'c': 2}} with {'a': {'d': 3}}. You cannot use dict.update() because the top-level key (node) is the same in both dicts:

>>> d1 = {'a': {'b': 1, 'c': 2}}
>>> d2 = {'a': {'d': 3}}
>>> d1.update(d2)
>>> d1
{'a': {'d': 3}}
}}}

Then d1 is overwritten with d2. Only different branches should be updated.

def _union(d1, d2):
    d = dict()
    for (k2,v2) in d2.items():
        if k2 in d1:
            d[k2] = _union(d1[k2], d2[k2])
        else:
            d[k2] = v2
        for k1 in d1:
            if k1 not in d2:
                d[k1] = d1[k1]
    return d

Describe objects in YAML

YAML is a human-readable format to serialize data. As an example we write the following YAML file in test0.yaml:

!!python/object:test.A
    a: 1

To load this file we write a simple python script:

import yaml
import sys
from pprint import pprint

class A(object):
    def __init__(self, a):
        self.a = a

if __name__ == '__main__':
    obj = yaml.load(file(sys.argv[1]).read())
    pprint(obj)
    print obj.a
$ python test.py test0.yaml 
<test.A object at 0xfd5750>
1

Now we want to instanciate another object with the value 2:

- !!python/object:test.A
    a: 1
- !!python/object:test.A
    a: 2

We need to adapt test.py because the YAML file now represents a list of objects:

import yaml
import sys
from pprint import pprint

class A(object):
    def __init__(self, a):
        self.a = a
if __name__ == '__main__':
    objs = yaml.load(file(sys.argv[1]).read())
    pprint(objs)
    for i in objs:
        print i.a
$ python test.py test0.yaml 
[<test.A object at 0x10d2b10>, <test.A object at 0x10d2cd0>]
1
2


Continuations

Continuations allows to save the execution context, return a value, et resume the execution later. They are commonly used to implemented coroutines. Python implements continuations through generators. Basically a generator provides the interface:

  • next()
  • send()
  • the operator yield

Yield is called inside the generator function to return the value and stop the execution. The caller calls generator.next() to resume the generator where it stopped. send() allows to pass values to the generator, then inside the function, the value is assigned with:

value = (yield)

Below is a simple example of a continuation. It is called coroutine, however it is at the moment a simple continuatin. Later if we call coroutine from each other, they will act in cooperative multitasking. This is a way to implement light threads. You can also see Python Monocle and Python Greenlet. The former uses native Python generators while the latter is built on top of a C dynamic library to provide similar features. It goes beyond some limitations of Python generators like in the example test_generator.py where we see a generator in a nested function.


>>> def coroutine(length):
...     for x in xrange(length-1):
...         y = yield x
...         print y
... 
>>> c = coroutine(10)
>>> c.next()
0
>>> c.next()
None
1
>>> c.next()
None
2
>>> c.send(1)
1
3
>>> c.send(42)
42
4

The first value comes from 'print y', the second is the return value of the generator coroutine: 'yield x'.

When yield is called, the generator function coroutine returns yield's argument (here x) at the current state of the function. Inside the function, yield returns the argument of send(). It provides a way to save the save of the function, returns a value and then pass arguments to the function.

>>> def coroutine2(length):
...     y = 0
...     for x in xrange(length-1):
...         y = (yield x + y) or 0
...         print y
... 
>>> c2 = coroutine2(10)
>>> c2.next()
0
>>> c2.next()
0
1
>>> c2.next()
0
2
>>> c2.send(42)
42
45
Personal tools