Getting Started with WSGI
What is WSGI, and why should you care?
WSGI, or the Web Server Gateway Interface, is a Python Enhancement Proposal (#333) authored by Philip Eby in an attempt to address the lack of standardization among Python web frameworks. Think of it as the servlet spec for the Python world, only simpler. Although the WSGI specification is primarily aimed at framework developers, you can also develop application components to it. One of the aims of PEP 333 is ease of implementation, which consequently carries through to the development of those components. For example, the “Hello world” example given in the WSGI specification could hardly be any simpler to implement:
def simple_app(environ, start_response):
status = ‘200 OK’
response_headers = [(‘Content-type’,’text/plain’)]
start_response(status, response_headers)
return [‘Hello world!n’]
With the recent release of Python 2.5, the reference implementation of WSGI is available as a standard module (wsgiref), meaning that there is a direct path from developing application components to test and production environments. It’s easy to imagine rapid development of components using wsgiref, then using something more scalable for testing and production. That said, WSGI doesn’t provide many features you might expect or want for webapp development; for sessions, cookies, and the like, you’d also need a suitable web framework (of which there are many), or perhaps middleware or utilities to provide those services.
As to why you should care: if you’re a low-level person who prefers to bolt on utilities and modules to keep your development effort as free of constraint as possible, WSGI will hold considerable attraction. Even if you prefer the benefits offered by higher-level frameworks, chances are they’ll be built on top of WSGI, and it’s always useful to know what happens behind the scenes.
Installing wsgiref
For those (like myself) running Python 2.4.x, the good news is that the wsgiref module will still function. wsgiref is available from the Python Subversion repository, or you can download it from the command line via:
svn co http://svn.python.org/projects/python/trunk/Lib/wsgiref
Copy the wsgiref directory into the site-packages directory of your Python distribution (in my case, /usr/lib/python2.4/site-packages/) and check whether you can import the module. If so, you should be able to type import wsgiref in the Python console with no errors reported.
Testing the “Hello world” application shown earlier requires a few extra lines of span (see test1.py):
if __name__ == ‘__main__’:
from wsgiref import simple_server
httpd = simple_server.make_server(”, 8080, simple_app)
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
This uses simple_server (an implementation of the BaseHttpServer module) to provide basic web server facilities, and passes the name of the simple_app function as an argument to the make_server function. Run this program (python test1.py) and direct your browser to http://localhost:8080 to see it in action.
And Using an Object…
You don’t have to stick to simple functions for your applications–WSGI supports object instantiation for handling requests. To do so, create a class that implements the __init__ and __iter__ methods. For example, I’ve abstracted out some basic utilities in the following class. The __iter__ method checks for a do_ method matching the type of HTTP request (GET, PUT, etc.) and either calls that method to process, or sends an HTTP 405 in response. In addition, I’ve added a parse_fields method to parse the x-url-form-encoded parameters in the body of a request using the standard cgi module. Note that, for both object instantiation and simple method calls, the arguments (environ and start_response) are positional–the order is important, not the name.
import cgi
class BaseWSGI:
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
method = ‘do_%s’ % self.environ[‘REQUEST_METHOD’]
if not hasattr(self, method):
status = ‘405 Method Not Allowed’
response_headers = [(‘Content-type’,’text/plain’)]
self.start(status, response_headers)
yield ‘Method Not Allowed’
else:
m = getattr(self, method)
yield m()
def parse_fields(self):
s = self.environ[‘wsgi.input’].read(int(self.environ[‘CONTENT_LENGTH’]))
return cgi.parse_qs(s)
I can then subclass BaseWSGI to create a simple number-guessing application (test2.py):
import random
number = random.randint(1,100)
class Test(BaseWSGI):
def __init__(self, environ, start_response):
BaseWSGI.__init__(self, environ, start_response)
self.message = ”
def do_GET(self):
status = ‘200 OK’
response_headers = [(‘Content-type’,’text/html’)]
self.start(status, response_headers)
return ”’
<html>
<body>
<form method=”POST”>
<p>%s</p>
<p><input type=”text” name=”myparam” value=”” />
<p><input type=”submit” /></p>
</form>
</body>
</html>
”’ % self.message
def do_POST(self):
global number
fields = self.parse_fields()
if not fields.has_key(‘myparam’):
self.message = ‘You didn’t guess’
return self.do_GET()
guess = int(fields[‘myparam’][0])
if guess == number:
self.message = ‘You guessed correctly’
number = random.randint(1,100)
elif guess < number:
self.message = ‘Try again, the number is higher than your guess’
else:
self.message = ‘Try again, the number is lower than your guess’
return self.do_GET()
You may be thinking that all of this is somewhat like reinventing the wheel–which is true, to a point. However, the low-level nature of WSGI is designed to make implementing frameworks a straightforward process–and more standardized. If you don’t want to reinvent the wheel from an application perspective, look to a higher-level web framework, but do read on for some alternatives.
Flying in the Face of Tradition
To extend these simple examples into something a little more realistic, I’ll implement an extremely basic blogging application along RESTful lines: using HTTP GET to retrieve a single entry or a list of entries, PUT to add or update an entry, and DELETE to remove one.
The first step is to extend the BaseWSGI class slightly to handle GET requests in one of two ways: GET / should return a list of all entries, while GET [name] should return a named entry. To provide this, I’ve added span to the __iter__ method so that when the path requested is /, the text ALL gets appended to the method (meaning a subclass now needs to implement both do_GET and do_GETALL):
if request_method == ‘GET’ and self.environ[‘PATH_INFO’] == ‘/’:
method = method + ‘ALL’
At this point, I’ve decided to store the weblog entries as plain-text files, with nothing in the way of metadata for ordering or filtering. Obviously, in a real application you’d want to be able to search for entries based on particular criteria–perhaps by exposing more meaningful or useful resource URLs (for example, something like /2006/08/my-entry-name)–but for the purposes of this basic application, file-system storage will suffice. Thus, data access for a blog entry is as simple as:
class Entry:
def __init__(self, path, filename, load=True):
self.filename = os.path.join(path, filename.replace(‘+’, ‘-‘)) + ‘.txt’
self.title = filename.replace(‘-‘, ‘ ‘)
if load and os.path.exists(self.filename):
self.text = file(self.filename).read()
def save(self):
f = file(self.filename, ‘w’)
f.write(self.text)
f.close()
Presenting entries needs some kind of templating. Python has an abundance of choices, such as Cheetah, Kid, and Myghty, not to mention numerous others bundled with the various frameworks. To keep things simple, I’m using a homegrown templating engine that simply injects dynamic content based on the IDs in an XML document. (Given the constraint that all IDs must be unique, this is probably the simplest approach to templating XML, at least from a usage perspective.) Thus, the do_GET method of my application becomes:
def do_GET(self):
pathinfo = self.environ[‘PATH_INFO’][1:]
entry = Entry(blogdir, pathinfo)
if entry.text:
(ext, content_type) = self.get_type()
response_headers = [(‘Content-type’, content_type)]
if self.status_override:
status = self.status_override
else:
status = ‘200 OK’
self.start(status, response_headers)
tmp = self.engine.load(‘blog-single.’ + ext)
tmp[‘entry:title’] = entry.title
tmp[‘entry:text’] = entry.text
tmp[‘entry:link’] = template3.Element(None,
href=’http://localhost:8080/%s?type=%s’ % (entry.title.replace(‘ ‘, ‘-‘), ext))
return str(tmp)
else:
self.start(‘404 Not Found’, [(‘Content-type’, ‘text/html’)])
return ‘%s not found’ % pathinfo
Using the PATHINFO HTTP variable provided by wsgi, I load an entry, then check to see if the text exists; if not, the blog file was not present, so I return a standard 404 Not Found. If the entry loaded successfully, the get_type() method returns the extension to use for the template (and the content type) based on a type parameter passed in the URL. I create the response headers (just content type, for the moment), and start the response process by calling self.start. At this point I’ve also checked for the presence of status_override, which is a field used when another method calls do_GET (see the do_PUT method later). Finally, I set the content in the template using the IDs: entry:title, entry:text and entry:link. (I’ll return to the do_GETALL method shortly.)
The most important method from the WSGI perspective is start. It takes a response span and message, as well as the response headers as a list of tuples. I assigned it from the start_response positional parameter in BaseWSGI.
Adding an Entry
Creating a blog entry calls the do_PUT method, which performs several steps:
- Check the pathinfo and for a content-length greater than 0.
- Create an Entry object, using the pathinfo.
- If the Entry does not contain text, then this is a new blog post, so set the status override variable with “201 Created.”
- Load the content from the request using the wsgi.input environ variable.
- Finally, save the entry, then call the do_GET method to return something meaningful to the caller.
def do_PUT(self):
pathinfo = self.environ[‘PATH_INFO’][1:]
if pathinfo == ”:
self.start(‘400 Bad Request’, [(‘Content-type’, ‘text/html’)])
return ‘Missing path name’
elif not self.environ.has_key(‘CONTENT_LENGTH’) or self.environ[‘CONTENT_LENGTH’] == ”
or self.environ[‘CONTENT_LENGTH’] == ‘0’:
self.start(‘411 Length Required’, [(‘Content-type’, ‘text/html’)])
return ‘Missing content’
entry = Entry(blogdir, pathinfo)
if not entry.text:
self.status_override = ‘201 Created’
entry.text = self.environ[‘wsgi.input’].read(int(self.environ[‘CONTENT_LENGTH’]))
entry.save()
return self.do_GET()
For a DELETE, I just do the basics: check to see if the entry exists, delete and return a 204 Deleted:
def do_DELETE(self):
pathinfo = self.environ[‘PATH_INFO’][1:]
blogfile = os.path.join(blogdir, pathinfo.replace(‘+’, ‘-‘)) + ‘.txt’
if os.path.exists(blogfile):
os.remove(blogfile)
self.start(‘204 Deleted’, [ ])
return ‘Deleted %s’ % pathinfo
else:
self.start(‘404 Not Found’, [(‘Content-type’, ‘text/html’)])
return ‘%s not found’ % pathinfo
The do_GETALL method, which is the only one of the subclass methods that doesn’t actually correspond to an HTTP verb, is also the only one that differs from the validation+response cycle established by the other methods. do_GETALL will always return 200 OK, and will read in all .txt files in the specified blog directory, reusing the blog-single template (used in the do_GET method). The main differences between this method and do_GET revolve around templating (and are not particularly relevant to WSGI).
Testing
If I were creating a typical GET/POST web application, testing would be straightforward: use a browser. Because I’ve used REST semantics, I need to use another tool–in this case, Curl–to test all my application’s features. The first step is to start up the blog using python blog.py, and then:
- curl -v -X PUT http://localhost:8080/test1 -d @- will add an entry with the title “test1” (-d @- takes input from STDIN– hit Ctrl+D to stop).
- The same thing again: curl -v -X PUT http://localhost:8080/test1 -d @- will update that entry. (Notice that the 201 return span should change to a 200).
- curl -v http://localhost:8080/ will return a list of all entries.
- curl -v -X DELETE http://localhost:8080/test1 will delete the entry previously created.
I’ve included three template types: .xhtml for HTML viewing, .xml for simple XML output, and .atom to produce an Atom feed. Test these different templates by calling:
- curl -v http://localhost:8080/?type=xml
- curl -v http://localhost:8080/?type=xhtml
- curl -v http://localhost:8080/?type=atom
Middleware and Utilities
So far I’ve only demonstrated how to set up a basic, stateless application by extending the foundations provided by WSGI. If you’re thinking about larger-scale web application development, the recommended approach is undoubtedly to choose a suitable framework. This is not to say that developing such a webapp is impossible using basic WSGI, but you’ll need to add (by hand) a lot of the technology that you get for free with a framework–either by writing your own, or plugging in third-party middleware.
The WSGI perspective on middleware is an important part of the specification. Adding middleware involves wrapping layers of utility span around a base app to provide additional functionality; the PEP calls this a middleware stack. For example, to provide authentication facilities, you might wrap your application with BasicAuthenticationMiddleware; to compress responses, you might wrap it with another middleware component called CompressionMiddleware; and so on.
The Python Paste project provides WSGI middleware and various other useful utilities. As an example of how powerful the concept of middleware is, consider the use of Paste’s SessionMiddleware (see test3.py for more details):
from paste.session import SessionMiddleware
class myapp2:
def __init__(self, environ, start_response):
self.environ = environ
self.start = start_response
def __iter__(self):
session = self.environ[‘paste.session.factory’]()
if ‘count’ in session:
count = session[‘count’]
else:
count = 1
session[‘count’] = count + 1
self.start(‘200 OK’, [(‘Content-type’,’text/plain’)])
yield ‘You have been here %d times!n’ % count
app2 = SessionMiddleware(myapp2)
In this example, SessionMiddleware wraps myapp2. When a request comes in, SessionMiddleware adds the session factory to the environ with the key paste.session.factory, and when invoked in the first line of the __iter__ method, the session is returned as a simple dict. A stack of middleware components added to a basic WSGI application means you can have the benefits provided by many of the frameworks, without necessarily having to constrain yourself to a framework.
WSGI and mod_python
I’ve shown how to run web applications within the simple environment provided by wsgiref, but what about launching something on a live site? The WSGI wiki lists multiple servers that support WSGI, including (but not limited to) CherryPy, python-fastcgi, and Paste; chances are, if you’re using a framework, your production choices will be very easy. I’ve decided to use one of the simpler approaches: mod_python coupled with a modified version of wsgi_handler.py. Nicolas Borko wrote this script based on his reading of the PEP. It allows you to publish WSGI applications under Apache easily.
Consult the mod_python documentation for help installing mod_python in your environment, but certainly in the case of K/Ubuntu, the process is straightforward:
$ apt-get install libapache2-mod-python
$ cd /etc/apache2/mods-enabled
$ ln -s ../mods-available/mod_python.load
The handler needs to be accessible by mod_python before you go any further. You have two choices: either append the location of wsgi_handler.py to the Python path, or copy the file into site-packages (again, mine is /usr/lib/python2.4/site-packages/). For the moment, I’ve opted to copy. Once wsgi_handler is in place, create a configuration file (mod_python.conf) in the directory /etc/apache2/mods-enabled (or the location of module configuration files for your Apache setup) and insert at least some basic configuration:
<Directory /var/www/test>
PythonHandler wsgi_handler
PythonOption WSGI.Application test1::simple_app
AddHandler python-program .py
PythonPath “sys.path + [ ‘/var/python’ ]”
</Directory>
This configuration directs any requests to the test directory of my webroot (/var/www) with a .py extension to the wsgi_handler. The WSGI application I want to run is once again in the script test1.py (with the function simple_app). I’ve placed this file in /var/python (and the configuration adds this directory to the Python path). Restart Apache httpd and, with any luck, you’ll be able to browse to http://localhost/test/test.py.
My slightly modified version of wsgi_handler provides the ability to specify just a script in the mod_python configuration, rather than a script and function. This allows a more powerful configuration:
<Location “/test/foo”>
PythonHandler wsgi_handler
PythonOption WSGI.Application test1
SetHandler python-program
PythonPath “sys.path + [ ‘/var/python’ ]”
</Location>
Rather than a directory, this setting configures a location relative to the web server root. I’ve used SetHandler, which does not require the file extension. Also, the PythonOption now includes only a reference to the test1.py script. If you add these directives to your mod_python configuration file, you can use the URL http://localhost/test/foo/simple_app, which means you’ll now be able to add more than one WSGI application to the script. Whether this is a good idea in production span is debatable, but it’s certainly useful for development.
Conclusion
WSGI exposes one of the simplest APIs I’ve seen in a while, and I believe that very simplicity underlies its power. As a framework or utility developer, the middleware concept is an attractive approach to layering features without having to bolt everything in at the lowest level, while an application developer keen on “keeping it simple, stupid” can work with an extremely basic interface. With a growing number of the higher-level frameworks supporting WSGI, and with the addition of the wsgiref module to Python 2.5, you can easily roll WSGI into your own projects–you may even be using it already without knowing it. Hopefully this article has pointed you in a few directions for further reading and experimentation of your own.
Resources
- A couple of the better-known frameworks are (in no particular order):
- Pylons–for more info on WSGI support, check out the Pylons WSGI documentation.
- Django is WSGI-compliant (which means that Django will run on a WSGI server, not necessarily that you can write applications to the spec).
- CherryPy has a built-in CherryPyWSGI server.
- There’s mention of WSGI for TurboGears on LightTPD (TurboGears WSGI/LightTPD deployment).
- Nicolas Borko’s original wsgi_handler span is available from the Python Servlet Engine pages.