Intermediate - Classes
last updated 2 years ago by gerard #
COMMENT: This is part of a suggested restructuring of the Tutorial. Use the sidebar links to return to the current tutorial.
- Intro
- Names and Scopes
- Classes
- Exceptions
- List Comprehensions
- Iterators
- A Tour of the Standard Library - Part 1
- A Tour of the Standard Library - Part 2
- Unit Testing
- Timing and Debugging
- Regular Expressions
- XML
- cgi
Class Definition Syntax
The simplest form of class definition looks like this:
class ClassName(baseclass):
<statement-1>
.
.
.
<statement-N>
Class definitions, like function definitions (def statements) must be executed before they have any effect. (You could conceivably place a class definition in a branch of an if statement, or inside a function.)
In practice, the statements inside a class definition will usually be function definitions, but other statements are allowed, and sometimes useful -- we'll come back to this later. The function definitions inside a class normally have a peculiar form of argument list, dictated by the calling conventions for methods -- again, this is explained later.
When a class definition is entered, a new namespace is created, and used as the local scope -- thus, all assignments to local variables go into this new namespace. In particular, function definitions bind the name of the new function here.
When a class definition is left normally (via the end), a class object is created. This is basically a wrapper around the contents of the namespace created by the class definition; we'll learn more about class objects in the next section. The original local scope (the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name given in the class definition header (ClassName in the example).
Note also that classes can inherit from other classes by placing the names of the intended superclasses in parentheses. If Python is your first object oriented language, don't worry yet about what "inherit" and "superclass" means, we'll get into all that later.
Class Objects
Class objects support two kinds of operations: attribute references and instantiation.
Attribute references use the standard syntax used for all attribute references in Python: obj.name. Valid attribute names are all the names that were in the class's namespace when the class object was created. So, if the class definition looked like this:
class MyClass(object):
"A simple example class"
i = 12345
def f(self):
return 'hello world'
then MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object, respectively. Class attributes can also be assigned to, so you can change the value of MyClass.i by assignment. __doc__ is also a valid attribute, returning the docstring belonging to the class: "A simple example class".
Notice that in the class definition we inherit from object. In Python, all your classes should inherit from object, or from some other class that inherits from object (directly or indirectly). However, you don't actually have to. We could have written class MyClass:, and everything would still work. We'll get into the reasons for this in "New-Style Classes", below. For now though, try to always inherit from object.
Class instantiation uses function notation. Just pretend that the class object is a parameterless function that returns a new instance of the class. For example (assuming the above class):
x = MyClass()
creates a new instance of the class and assigns this object to the local variable x.
The instantiation operation (calling
a class object) creates an empty object. Many classes like to create objects with instances customized to a specific initial state. Therefore a class may define a special method named __init__(), like this:
def __init__(self):
self.data = []
When a class defines an __init__() method, class instantiation automatically invokes __init__() for the newly-created class instance. So in this example, a new, initialized instance can be obtained by:
x = MyClass()
Of course, the __init__() method may have arguments for greater flexibility. In that case, arguments given to the class instantiation operator are passed on to __init__(). For example,
>>> class Complex(object):
... def __init__(self, realpart, imagpart):
... self.r = realpart
... self.i = imagpart
...
>>> x = Complex(3.0, -4.5)
>>> x.r, x.i
(3.0, -4.5)
Instance Objects
Now what can we do with instance objects? The only operations understood by instance objects are attribute references. There are two kinds of valid attribute names, data attributes and methods.
data attributes correspond to instance variables
in Smalltalk, and to data members
in C++. Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to. For example, if x is the instance of MyClass created above, the following piece of code will print the value 16, without leaving a trace:
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print x.counter
del x.counter
The other kind of instance attribute reference is a method. A method is a function that belongs to
an object. (In Python, the term method is not unique to class instances: other object types can have methods as well. For example, list objects have methods called append, insert, remove, sort, and so on. However, in the following discussion, we'll use the term method exclusively to mean methods of class instance objects, unless explicitly stated otherwise.)
Valid method names of an instance object depend on its class. By definition, all attributes of a class that are function objects define corresponding methods of its instances. So in our example, x.f is a valid method reference, since MyClass.f is a function, but x.i is not, since MyClass.i is not. But x.f is not the same thing as MyClass.f -- it is a method object, not a function object.
Method Objects
Usually, a method is called right after it is bound:
x.f()
In the MyClass example, this will return the string 'hello world'. However, it is not necessary to call a method right away: x.f is a method object, and can be stored away and called at a later time. For example:
xf = x.f
while True:
print xf()
will continue to print "hello world" until the end of time.
What exactly happens when a method is called? You may have noticed that x.f() was called without an argument above, even though the function definition for f specified an argument. What happened to the argument? Surely Python raises an exception when a function that requires an argument is called without any -- even if the argument isn't actually used...
Actually, you may have guessed the answer: the special thing about methods is that the object is passed as the first argument of the function. In our example, the call x.f() is exactly equivalent to MyClass.f(x). In general, calling a method with a list of n arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method's object before the first argument.
If you still don't understand how methods work, a look at the implementation can perhaps clarify matters. When an instance attribute is referenced that isn't a data attribute, its class is searched. If the name denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, it is unpacked again, a new argument list is constructed from the instance object and the original argument list, and the function object is called with this new argument list.
Random Remarks
Data attributes override method attributes with the same name; to avoid accidental name conflicts, which may cause hard-to-find bugs in large programs, it is wise to use some kind of convention that minimizes the chance of conflicts. Possible conventions include capitalizing method names, prefixing data attribute names with a small unique string (perhaps just an underscore), or using verbs for methods and nouns for data attributes.
Data attributes may be referenced by methods as well as by ordinary users (clients
) of an object. In other words, classes are not usable to implement pure abstract data types. In fact, nothing in Python makes it possible to enforce data hiding -- it is all based upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and control access to an object if necessary; this can be used by extensions to Python written in C.)
Clients should use data attributes with care -- clients may mess up invariants maintained by the methods by stamping on their data attributes. Note that clients may add data attributes of their own to an instance object without affecting the validity of the methods, as long as name conflicts are avoided -- again, a naming convention can save a lot of headaches here.
There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a method.
Often, the first argument of a method is called self. This is nothing more than a convention: the name self has absolutely no special meaning to Python. (Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a class browser program might be written that relies upon such a convention.)
Any function object that is a class attribute defines a method for instances of that class. It is not necessary that the function definition is textually enclosed in the class definition: assigning a function object to a local variable in the class is also ok. For example:
# Function defined outside the class
def f1(self, x, y):
return min(x, x+y)
class C(object):
f = f1
def g(self):
return 'hello world'
h = g
Now f, g and h are all attributes of class C that refer to function objects, and consequently they are all methods of instances of C -- h being exactly equivalent to g. Note that this practice usually only serves to confuse the reader of a program.
Methods may call other methods by using method attributes of the self argument:
class Bag(object):
def __init__(self):
self.data = []
def add(self, x):
self.data.append(x)
def addtwice(self, x):
self.add(x)
self.add(x)
Methods may reference global names in the same way as ordinary functions. The global scope associated with a method is the module containing the class definition. (The class itself is never used as a global scope!) While one rarely encounters a good reason for using global data in a method, there are many legitimate uses of the global scope: for one thing, functions and modules imported into the global scope can be used by methods, as well as functions and classes defined in it. Usually, the class containing the method is itself defined in this global scope, and in the next section we'll find some good reasons why a method would want to reference its own class!
Inheritance
Of course, a language feature would not be worthy of the name class
without supporting inheritance. The syntax for a derived class definition looks like this:
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
The name BaseClassName must be defined in a scope containing the derived class definition. In place of a base class name, other arbitrary expressions are also allowed. This can be useful, for example, when the base class is defined in another module:
class DerivedClassName(modname.BaseClassName):
Execution of a derived class definition proceeds the same as for a base class. When the class object is constructed, the base class is remembered. This is used for resolving attribute references: if a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class.
There's nothing special about instantiation of derived classes: DerivedClassName() creates a new instance of the class. Method references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this yields a function object.
Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively virtual.)
An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call "BaseClassName.methodname(self, arguments)". This is occasionally useful to clients as well. (Note that this only works if the base class is defined or imported directly in the global scope.)
New-Style Classes
As we stated before, you should always use object, or another class derived from object, as your baseclass. At the same time though, your classes technically don't have to inherit from anything at all.
The reason for this is that Python actually has two separate type systems, often called "old-style classes" and "new-style classes".
"Old-style classes" (classes that at no point inherit from object) are a relic from the days of old, ie. versions of Python less than 2.2. In those days, the situation around types and classes was sometimes murky, and there were a variety of restrictions in place because of this.
As of Python 2.2 though, so-called "new-style classes" were introduced. These classes, which all have object as their base superclass, are the basis for all the types in Python. Every single base type (int, strings, lists, dictionaries, etc.) all inherit from object. This brought a better consistency to the type system, and also enabled a bunch of fun new features (such as use of the property keyword).
As an example, in the interpreter, try this:
>>> isinstance(5, object)
True
See that? Even the number 5 is an instance of a class that inherits from object.
Going forward into the future, all your classes should be "new-style". You may not see any immediate differences, but as you get deeper into Python, you'll start to see examples of things that can only be done with new-style. All old-style classes will continue to work, for backwards compatability, but they are not recommended for new Python code.
Multiple Inheritance
Python supports a limited form of multiple inheritance as well. A class definition with multiple base classes looks like this:
class DerivedClassName(Base1, Base2, Base3):
<statement-1>
.
.
.
<statement-N>
The only rule necessary to explain the semantics is the resolution rule used for class attribute references. This is depth-first, left-to-right. Thus, if an attribute is not found in DerivedClassName, it is searched in Base1, then (recursively) in the base classes of Base1, and only if it is not found there, it is searched in Base2, and so on.
(To some people breadth first -- searching Base2 and Base3 before the base classes of Base1 -- looks more natural. However, this would require you to know whether a particular attribute of Base1 is actually defined in Base1 or in one of its base classes before you can figure out the consequences of a name conflict with an attribute of Base2. The depth-first rule makes no differences between direct and inherited attributes of Base1.)
It is clear that indiscriminate use of multiple inheritance is a maintenance nightmare, given the reliance in Python on conventions to avoid accidental name conflicts. A well-known problem with multiple inheritance is a class derived from two classes that happen to have a common base class. While it is easy enough to figure out what happens in this case (the instance will have a single copy of instance variables
or data attributes used by the common base class), it is not clear that these semantics are in any way useful.
Another benefit that new-style classes brought was a more consistent method resolution ordering (mro) for classes using multiple inheritance. For more on this, see Guido's new-style class essay.
Private Variables
There is limited support for class-private identifiers. Any identifier of the form __spam (at least two leading underscores, at most one trailing underscore) is textually replaced with _classname__spam, where classname is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, so it can be used to define class-private instance and class variables, methods, variables stored in globals, and even variables stored in instances. private to this class on instances of other classes. Truncation may occur when the mangled name would be longer than 255 characters. Outside classes, or when the class name consists of only underscores, no mangling occurs.
Name mangling is intended to give classes an easy way to define private
instance variables and methods, without having to worry about instance variables defined by derived classes, or mucking with instance variables by code outside the class. Note that the mangling rules are designed mostly to avoid accidents; it still is possible for a determined soul to access or modify a variable that is considered private. This can even be useful in special circumstances, such as in the debugger, and that's one reason why this loophole is not closed. (Buglet: derivation of a class with the same name as the base class makes use of private variables of the base class possible.)
Notice that code passed to exec, eval() or evalfile() does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing __dict__ directly.
Odds and Ends
Sometimes it is useful to have a data type similar to the Pascal record
or C struct
, bundling together a few named data items. An empty class definition will do nicely:
class Employee(object):
pass
john = Employee() # Create an empty employee record
# Fill the fields of the record
john.name = 'John Doe'
john.dept = 'computer lab'
john.salary = 1000
A piece of Python code that expects a particular abstract data type can often be passed a class that emulates the methods of that data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with methods read() and readline() that get the data from a string buffer instead, and pass it as an argument.
Instance method objects have attributes, too: m.im_self is the instance object with the method m, and m.im_func is the function object corresponding to the method.
Tired Of Typing object
If you're tired of always inheriting from object (ie. you don't want to waste your time typing an extra 8 characters per class you define), there is a simple trick for making all your classes new-style, without explicitly typing object.
At the beginning of any module where you want new-style classes, simply do:
__metaclass__ = type
After that, you can simply define your classes like:
class Foo:
def bar(self):
pass
and Python will treat the class as if it had inherited from object. The reasons for this are very deep and beyond the scope of this document, and it's recommended that you probably don't do this (as it's non-obvious what it does), but it is one way around explicitly inheriting from object.
Exceptions Are Classes Too
User-defined exceptions are identified by classes as well. Using this mechanism it is possible to create extensible hierarchies of exceptions.
You can use classes with the raise statement in several ways, including:
raise Class, instance
raise Class(argument)
raise instance
In the first form, instance must be an instance of Class or of a class derived from it. If it is something else, Python creates an exception object by calling the Class constructor with the given instance as argument, and raises the resulting object.
The third form is a shorthand for:
raise instance.__class__, instance
A class in an except clause matches an exception if it is the same class, or a base class thereof (but not the other way around -- an except clause listing a derived class will not match the base class). For example, the following code will print B, C, D in that order:
class B(object):
pass
class C(B):
pass
class D(C):
pass
for c in [B, C, D]:
try:
raise c()
except D:
print "D"
except C:
print "C"
except B:
print "B"
Note that if the except clauses were reversed (with "except B" first), it would have printed B, B, B -- Python checks the except clauses in order, and picks the first one that matches.
When an error message is printed for an unhandled exception, the exception's class name is printed, then a colon and a space, and finally the exception instance (converted to a string using the built-in function str()).
