Managing Larger Programs
At the beginning of this book, we came up with four basic programming patterns which we use to construct programs:
- Sequential code
- Conditional code (if statements)
- Repetitive code (loops)
- Store and reuse (functions)
In later chapters, we explored simple variables as well as collection data structures like lists, tuples, and dictionaries.
As we build programs, we design data structures and write code to manipulate those data structures. There are many ways to write programs and by now, you probably have written some programs that are "not so elegant" and other programs that are "more elegant". Even though your programs may be small, you are starting to see how there is a bit of "art" and "aesthetic" to writing code.
As programs get to be millions of lines long, it becomes increasingly important to write code that is easy to understand. If you are working on a million line program, you can never keep the entire program in your mind at the same time. So we need ways to break the program into multiple smaller pieces so to solve a problem, fix a bug, or add a new feature we have less to look at.
In a way, object oriented programming is a way to arrange your code so that you can zoom into 500 lines of the code, and understand it while ignoring the other 999,500 lines of code for the moment.
Like many aspects of programming it is necessary to learn the concepts of object oriented programming before you can use them effectively. So approach this chapter as a way to learn some terms and concepts and work through a few simple examples to lay a foundation for future learning. Throughout the rest of the book we will be using objects in many of the programs but we won't be building our own new objects in the programs.
The key outcome of this chapter is to have a basic understanding of how objects are constructed and how they function and most importantly how we make use of the capabilities of objects that are provided to us by Python and Python libraries.
It turns out we have been using objects all along in this class. Python provides us with many built-in objects. Here is some simple code where the first few lines should feel very simple and natural to you.
But instead of focusing on what these lines accomplish, lets look at what is really happening from the point of view of object-oriented programming. Don't worry if the following paragraphs don't make any sense the first time you read them because we have not yet defined all these terms.
The first line is constructing an object of type list, the second and third lines are calling the
append() method, the fourth line is calling the
sort() method, and the fifth line is retrieving the item at position 0.
The sixth line is calling the
__getitem__() method in the
stuff list with a parameter of zero.
The seventh line is an even more verbose way of retrieving the 0th item in the list.
In this code, we care calling the
__getitem__ method in the
list class and passing in the list (
stuff) and the item we want retrieved from the list as parameters.
The last three lines of the program are completely equivalent, but it is more convenient to simply use the square bracket syntax to look up an item at a particular position in a list.
We can take a look into the capabilities of an object by looking at the output of the
>>> stuff = list() >>> dir(stuff) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>>
The precise definition of
dir() is that it lists the methods and attributes of a Python object.
The rest of this chapter will define all of the above terms so make sure to come back after you finish the chapter and re-read the above paragraphs to check your understanding.
Starting with Programs
A program in its most basic form takes some input, does some processing, and produces some output. Our elevator conversion program demonstrates a very short but complete program showing all three of these steps.
If we think a bit more about this program, there is the "outside world" and the program. The input and output aspects are where the program interacts with the outside world. Within the program we have code and data to accomplish the task the program is designed to solve.
When we are "in" the program, we have some defined interactions with the "outside" world, but those interactions are well defined and generally not something we focus on. While we are coding we worry only about the details "inside the program".
One way to think about object oriented programming is that we are separating our program into multiple "zones". Each "zone" contains some code and data (like a program) and has well defined interactions with the outside world and the other zones within the program.
If we look back at the link extraction application where we used the BeautifulSoup library, we can see a program that is constructed by connecting different objects together to accomplish a task:
We read the URL into a string, and then pass that into
urllib to retrieve the data from the web. The
urllib library uses the
socket library to make the actual network connection to retrieve the data. We take the string that we get back from
urllib and hand it to BeautifulSoup for parsing. BeautifulSoup makes use of another object called
html.parser1 and returns an object. We call the
tags() method in the returned object and then get a dictionary of tag objects, and loop through the tags and call the
get() method for each tag to print out the 'href' attribute.
We can draw a picture of this program and how the objects work together.
The key here is not to fully understand how this program works but to see how we build a network of interacting objects and orchestrate the movement of information between the objects to create a program. It is also important to note that when you looked at that program several chapters back, you could fully understand what was going on in the program without even realizing that the program was "orchestrating the movement of data between objects". Back then it was just lines of code that got the job done.
Subdividing a Problem - Encapsulation
One of the advantages of the object oriented approach is that it can hide complexity. For example, while we need to know how to use the
urllib and BeautifulSoup code, we do not need to know how those libraries work internally. It allows us to focus on the part of the problem we need to solve and ignore the other parts of the program.
This ability to focus on a part of a program that we care about and ignore the rest of the program is also helpful to the developers of the objects. For example the programmers developing BeautifulSoup do not need to know or care about how we retrieve our HTML page, what parts we want to read or what we plan to do with the data we extract from the web page.
Another word we use to capture this idea that we ignore the internal detail of objects we use is "encapsulation". This means that we can know how to use an object without knowing how it internally accomplishes what we need done.
Our First Python Object
At its simplest, an object is some code plus data structures that is smaller than a whole program. Defining a function allows us to store a bit of code and give it a name and then later invoke that code using the name of the function.
An object can contain a number of functions (which we call "methods") as well as data that is used by those functions. We call data items that are part of the object "attributes".
We use the
class keyword to define the data and code that will make up each of the objects. The class keyword includes the name of the class and begins an indented block of code where we include the attributes (data) and methods (code).
Each method looks like a function, starting with the
def keyword and consisting of an indented block of code. This example has one attribute (x) and one method (party). The methods have a special first parameter that we name by convention
Much like the
def keyword does not cause function code to be executed, the
class keyword does not create an object. Instead, the
class keyword defines a template indicating what data and code will be contained in each object of type
PartyAnimal. The class is like a cookie cutter and the objects created using the class are the cookies2. You don't put frosting on the cookie cutter, you put frosting on the cookies - and you can put different frosting on each cookie.