Python Bloggers
Quick note to self, hopefully useful to others too:

If you compile Python 2.6 (or 2.5) from source, and you want to enable sqlite3 support (which is included in the stdlib for 2.5 and above), then you need to pass a special USE flag to the configuration command line, like this:

./configure USE="sqlite"

(note "sqlite" and not "sqlite3")
Just made a quick micro release of 2.0.1 for realStorage. There was a ReferenceError under browsers lacking Gears. It didn't manifest itself as an error in my unit tests as it was in the last executed line of the file.
realStorage 2.0 is now officially released. For those that don't know, realStorage is a compatibility library that handles browser incompatibilities for the W3C Web Storage API as of August. What's new in this release and why I am specifically supporting the August version of the spec and not the newest one, read on.



The biggest deal in realStorage 2 is support for Chrome 4 dev and IE 8. I don't know when Chrome's dev channel flipped the bit to let web pages access localStorage, but I have it under OS X so I was able to verify support.

As for IE 8, supporting that browser required some API tweaking. Turns out the browser has no support for accessors on non-DOM objects. That meant I had to make the length property optional and add a getLength() function.

The other major addition is Gears support. I needed to benchmark some of my PhD work on Chrome before it had localStorage support, so I had to add support to use Gears to get any results. This back-end is provided mostly for transition purposes as it looks like Gears might be on its way out in favor of the Web Storage API.

Otherwise I took this new major release as a chance to rename some things and continue to try to optimize the code as much as possible.

One thing to keep in mind, though, is that realStorage currently only supports the Web Storage spec as of August. The reason I am using an old version is that after the August draft the API switched to accepting structured clones instead of only strings. This makes implementation exceedingly difficult as it would mean coming up with my own serialization format for things such as Regexp and ImageData objects. I don't want to deal with that, so realStorage sticks to the old spec.

I emailed the WHATWG about this and voiced my concerns. Some supported me in wanting to roll back to the strings-only version. Others wanted to drop the spec entirely, but since all current browsers support the API that was deemed unreasonable. Otherwise wanted to tweak it to make it asynchronous to avoid lockup edge cases that exist. In the end the thread kind of died out and it seems the Web Database spec will become the more focused one, especially since databases can be accessed in web workers and they standardized what SQL dialect to support (sqlite 3.16.9 specifically).

At this point I consider realStorage done until the next round of major browser updates comes around. If they support the new version of the spec I will update realStorage to 3.0 to accept only strings. Otherwise there is nothing critical left to add to this code.
Today marks the 5th anniversary of my blog. It's been a fun and rewarding experience, and I hope to never run out of interesting topics to post about ;-)

As a sort of retrospective, I was curious to see which of my blog posts have been getting the most traffic. Here's the top 10 over the last 9 months, according to Google Analytics:

1. Performance vs. load vs. stress testing (as an aside, I think this has been wildly popular because I inadvertently hit on a lot of keywords in the title)
2. Experiences deploying a large-scale infrastructure in Amazon EC2
3. Ajax testing with Selenium using waitForCondition
4. Useful tools for writing Selenium tests
5. Load balancing in EC2 with HAProxy
6. Python unit testing part 1: the unittest module
7. HTTP performance testing with httperf, autobench and openload
8. Running a Python script as a Windows service
9. Apache virtual hosting with Tomcat and mod_jk
10. Configuring Apache 2 and Tomcat 5.5 with mod_jk

It's interesting that 2 of the top 5 posts are Selenium-related. I think Selenium documentation is not where it needs to be generally speaking, hence people find my old posts on this topic. Adam, you really need to write a Selenium RC book!
I've been using Munin for its resource graphing capabilities. I especially like the fact that you can group servers together and watch a common metric (let's say system load) across all servers in a group -- something that is hard to achieve with other similar tools such as Cacti and Ganglia.

I did have the need to monitor multiple MySQL instances running on the same server. I am using mysql-sandbox to launch and manage these instances. I haven't found any pointers on how to use Munin to monitor several MySQL instances, so I rolled my own solution.

First of all, here is my scenario:
  • server running Ubuntu 9.04 64-bit
  • N+1 MySQL instances installed as sandboxes rooted in /mysql/m0, /mysql/m1,..., /mysql/mN
  • munin-node package version 1.2.6-8ubuntu3 (installed via 'apt-get install munin-node')
Step 1

Locate mysql_* plugins already installed by the munin-node package in /usr/share/munin/plugins. I have 5 such plugins: mysql_bytes, mysql_isam_space_, mysql_queries, mysql_slowqueries and mysql_threads. I don't use ISAM, so I am ignoring mysql_isam_space_.


Step 2

Make a copy of each plugin for each MySQL instance you want to monitor. I know this contradicts the DRY principle, but I just wanted something quick that worked. The alternative is to modify the plugins and add extra parameters so they refer to specific MySQL instances.

For example, I made N + 1 copies of mysql_bytes and called them mysql_m0_bytes, mysql_m1_bytes,..., mysql_mN_bytes. In each copy, I modified the line "echo 'graph_title MySQL throughput'" to say "echo 'graph_title MySQL throughput for mN'". I did the same for mysql_threads, mysql_queries and mysql_slowqueries. So at the end of this step I have 4 x (N+1) new plugins in /usr/share/munin/plugins.


As I said, the alternative is to modify for example mysql_bytes and add new parameters, e.g. a parameter for the title of the graph. However, I don't know exactly how the plugin is called from within Munin, and I don't want to fiddle with the number and order of parameters it's called with -- which is why I chose the easy way out.

Step 3

Create symlinks in /etc/munin/plugins to the newly created plugins. Example:

ln -s /usr/share/munin/plugins/mysql_m0_bytes /etc/munin/plugins/mysql_m0_bytes

(and similar for all the other plugins).


Step 4

Specify the path to msyqladmin for the newly defined plugins. You do this by editing the plugin configuration file /etc/munin/plugin-conf.d/munin-node.

Here's what I have in this file related to MySQL:

[mysql_m0*]
user m0
env.mysqladmin /mysql/m0/my sqladmin

[mysql_m1*]
user m1
env.mysqladmin /mysql/m1/my sqladmin

[mysql_m2*]
user m2
env.mysqladmin /mysql/m2/my sqladmin

[mysql_m3*]
user m3
env.mysqladmin /mysql/m3/my sqladmin

What the above lines say is that for each class of plugins starting with mysql_mN, I want to use the mysqladmin utility for that particular MySQL instance. The way mysql-sandbox works, mysqladmin is actually available per instance as "/mysql/mN/my sqladmin".

Note that the naming convention is important. The syntax of the munin-node plugin configuration file says that the plugin name "May include one wildcard ('*') at the start or end of the plugin-name, but not both, and not in the middle." Trust me, I haven't read this fine print initially, and I named my new plugins something like mysql_bytes_mN, then tried to configure the plugins as mysql_*mN. Pulling hair time ensued.

Step 5

Restart munin-node via 'service munin-node restart'. At this point you're supposed to see the new graphs under the Mysql link corresponding to the munin node where you set all this up. You should see N+1 graphs for each type of plugin (mysql_bytes, mysql_threads, mysql_queries and mysql_slowqueries). The graphs can be easily differentiated by their titles, e.g. 'MySQL throughput for m0' or 'MySQL queries for m1', etc.

One other quick tip: if you want to easily group nodes together, come up with some domain name which doesn't need to correspond to a real DNS domain name. For example, I called my MySQL servers something like mysqlN.myproject.mydomain.com in /etc/munin/munin.conf on the Munin server side. This allows me to see the myproject.mydomain.com group at a glance, with all the metrics for the nodes in that group shown side by side.

Here's how I defined each node in munin.conf:

[mysqlN.myproject.mydomain.com]
address 192.168.0.N
use_node_name yes

(where N is 1, 2, etc)
I just read a post by Matthew Flanagan on Behaviour Driven Infrastructure or BDI, a concept that apparently originates with Martin Englund's post on this topic. The idea is that you describe what you need your system to do in natural language, using for example a tool such as Cucumber. What's more, you can then use the cucumber-nagios plugin to express the desired behaviour of the new system as a series of Nagios checks. The checks will initiall fail (just like in a TDD or BDD development cycle), but you will make them pass by deploying the appropriate packages and applications to the system.

I also expressed the need for automated testing of production deployments in one of my blog posts. However, BDI goes one step further, by describing a test plan for production deployments in natural language. Pretty cool, and again I can only wish that the Python testing tools kept up with Ruby-based tools such as Cucumber and friends....
Matt Yonkovit has started a series of posts on Tokyo Tyrant at Percona's MySQL Performance Blog. Great in-depth analysis of the reliability and performance of TT.

Part 1: Tokyo Tyrant -- is it durable?
Part 2: Tokyo Tyrant -- the performance wall
Part 3: Tokyo Tyrant -- write bottleneck

(parts 4 and 5, about replication and scaling, are hopefully coming soon)
Via Ben Bangert, this gem of a page showing the continuous integration status for the Chromium project at Google. It's cool to see that they're using buildbot. But just like Ben says -- I wish they open sourced the look and feel of that buildbot status page ;-)
Scenario: you mount a volume exported from a NetApp on several Linux clients via NFS

Problem: you see constant high CPU usage on the NetApp, and some of the Linux clients become sluggish, primarily in terms of I/O

Troubleshooting steps:

1) If iostat is not already on the clients, install the sysstat utilities.

2) On each client mounting from the filer, or on a representative sample of the clients, run iostat with -n so that it shows NFS-related statistics. The following
command will run iostat every 5 seconds and show NFS stats in a nicely tabulated output:

# iostat -nh 5

3) Notice which client exhibits the most NFS operations per second, and correlate it with the NFS volume on that client which shows the most NFS reads and/or writes per second.

At this point you found the most likely culprit in terms of sending NFS traffic to the filer (there could be several client machines in this position, for example if they are part of a cluster).

5) If not already installed, download and install lsof.

6) Run lsof on the client(s) discovered in step 4, and grep for the directory representing the mount point of the NFS volume with the most reads and/or writes. For example:
 
# lsof | grep /var/log

This will show you, among other things, which processes are accessing which files under that directory. Usually something will jump out at you in terms of things that are going on outside of the ordinary. In my case, it was logrotate kicking off from a daily cron and compressing a huge log file -- since the log file was on a volume NFS-mounted from the filer, this caused the filer to do extra work, hence its increased CPU usage.

That's about it. Of courser these steps can be refined/modified/added to -- but even in this simple form, they can help you pinpoint NFS issues fairly quickly.
I've been looking into various configuration management/automated deployment tools lately. At OpenX we used slack, but I wanted something with a bit more functionality than that (although I'm not badmouthing slack by any means -- it can definitely be bent to your will to do pretty much whatever you need in terms of automating your deployments).

From what I see, there are 2 types of configuration management tools:
  1. The first type I call 'pull', which means that the servers pull their configurations and their marching orders in terms of applying those configurations from a centralized location -- both slack and Puppet are in this category. I think this is great for initial configuration of a server. As I described in another post, you can have a server bootstrap itself by installing Puppet (or slack) and then 'call home' to the central Puppet master (or slack repository) and get all the information it needs to configure itself
  2. The second type I call 'push', which means that you send configurations and commands to a list of servers from a centralized location -- Fabric is in this category. I think this is a more appropriate mode for application-specific deployments, where you might want to deploy first to a subset of servers, then push it to all servers.
So, as a rule of thumb, I think it makes sense to use a tool like Puppet for the initial configuration of the OS and of the packages required by your application (things like MySQL, Apache, Tomcat, Tornado, Nginx, or whatever your application relies on). When it comes time to deploy your application, I think a tool like Fabric is more appropriate, since it gives you more immediate and finer-grained control over what you want to do.

I also like the categorization of these tools done by the people at ControlTier. Check out their blog post on Achieving Fully Automated Provisioning (which also links to a white paper PDF) for a nice diagram of hierarchy of deployment tools:
  • at the bottom you have tools that install or launch the initial OS on physical servers (via Kickstart/Jumpstart/Cobbler) or on virtual machines/cloud instances (via various vendor tools, or by rolling your own)
  • in the middle you have what they call 'system configuration' tools, such as Puppet/Chef/SmartFrog/cfengine/bcfg2
  • at the top you have what they call 'application service deployment' tools, such as Fabric/Capistrano/Func -- and of course their own ControlTier tool
In a comment on one of my posts,  Damon Edwards from ControlTier calls Fabric a "command dispatching tool", as opposed to Puppet, which he calls a "configuration management tool". I think this relates to the 2 types of tools I described above, where you 'push' or 'dispatch' commands with Fabric, and you 'pull' configurations and actions with Puppet.

Before I go on, let me just say that in my evaluation of different deployment tools, I quickly eliminated the ones that use XML as their configuration language. In my experience, many tools that aim to be language-neutral end up using XML as their configuration language, and then they try to bend XML into a 'real' programming language, thus ending up reinventing the wheel badly. I'd rather use a language I like (Python in my case) as the glue around the various tools in my toolchain. Your mileage may vary of course.

OK, enough theory, let's see some practical examples of Puppet and Fabric in action. While Fabric is very easy to install and has a minimal learning curve, I can't say the same about Puppet. It takes a while to get your brain wrapped around it, and there isn't a lot of great documentation online, so for this reason I warmly recommend that you go buy the book.

Puppet examples

The way I organize things in Puppet is by creating a module for each major package I need to configure. On my puppetmaster server, under /etc/puppet/modules, I have directories such as apache2, mysqlserver, nginx, scribe, tomcat, tornado. Under each such directory I have 2 directories, one called files and one called manifests. I keep files and directories that I need downloaded to the puppet clients under files, and I create manifests (series of actions to be taken on the puppet clients) under manifests. I usually have a single manifest file called init.pp.

Here's an example of the init.pp manifest file for my tornado module:

class tornado {
$tornado = "tornado-0.2"
$url = "http://mydomain.com/download"

$tornado_root_dir = "/opt/tornado"
$tornado_log_dir = "/opt/tornado/logs"
$tornado_src_dir = "/opt/tornado/$tornado"

Exec {
logoutput => on_failure,
path => ["/bin", "/sbin", "/usr/bin", "/usr/sbin", "/usr/local/bin", "/usr/local/sbin"]
}

file {
"$tornado_root_dir":
ensure => directory,
recurse => true,
source => "puppet:///tornado/bin";
}

file {
"$tornado_log_dir":
ensure => directory,
}

package {
["curl", "libcurl3", "libcurl3-gnutls", "python-setuptools", "python-pycurl", "python-simplejson", "python-memcache", "python-mysqldb", "python-imaging"]:
ensure => installed;
}

define install_pkg ($pkgname, $extra_easy_install_args = "", $module_to_test_import) {
exec {
"InstallPkg_$pkgname":
command => "easy_install-2.6 $extra_easy_install_args $pkgname",
unless => "python2.6 -c 'import $module_to_test_import'",
require => Package["python-setuptools"];
}
}

install_pkg {
"virtualenv":
pkgname => "virtualenv",
module_to_test_import => "virtualenv";

"boto":
pkgname => "boto",
module_to_test_import => "boto";

"grizzled":
pkgname => "grizzled",
module_to_test_import => "grizzled.os";
}

$oracle_root_dir = "/opt/oracle"

case $architecture {
i386, i686: {
$oracle_instant_client_pkg = "instantclient_11_2-linux-i386"
$oracle_instant_client_dir = "instantclient_11_2"
}
x86_64: {
$oracle_instant_client_pkg = "instantclient_11_1-linux-x86_64"
$oracle_instant_client_dir = "instantclient_11_1"
}
}

package {
["libaio-dev", "gcc"]:
ensure => installed;
}

file {
"$oracle_root_dir":
ensure => directory;
}

exec {
"InstallOracleInstantclient":
command => "(cd $oracle_root_dir; wget $url/$oracle_instant_client_pkg.tar.gz; tar xvfz $oracle_instant_client_pkg.tar.gz; rm $oracle_instant_client_pkg.tar.gz;
cd $oracle_instant_client_dir; ln -s libclntsh.so.11.1 libclntsh.so); echo $oracle_root_dir/$oracle_instant_client_dir > /etc/ld.so.conf.d/oracleinstantclient.conf; ldconfig",
creates => "$oracle_root_dir/$oracle_instant_client_dir",
require => File[$oracle_root_dir];
}

$cx_oracle = "cx_Oracle-5.0.2"
exec {
"InstallCxOracle":
command => "(cd $oracle_root_dir; wget $url/$cx_oracle.tar.gz; tar xvfz $cx_oracle.tar.gz; rm $cx_oracle.tar.gz; cd $oracle_root_dir/$cx_oracle; export ORACLE_HO
ME=$oracle_root_dir/$oracle_instant_client_dir; python2.6 setup.py install)",
unless => "python2.6 -c 'import cx_Oracle'",
require => [Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"]];
}

exec {
"InstallTornado":
command => "(cd $tornado_root_dir; wget $url/$tornado.tar.gz; tar xvfz $tornado.tar.gz; rm $tornado.tar.gz; cd $tornado; python2.6 setup.py install)",
creates => $tornado_src_dir,
unless => "python2.6 -c 'import tornado.web'",
require => [File[$tornado_root_dir], Package["python-pycurl"], Package["python-simplejson"], Package["python-memcache"], Package["python-mysqldb"]];
}
}

I'll go through this file from the top down. At the very top I declare some variables that are referenced throughout the file. In particular, $url points to the location where I keep large files that I need every puppet client to download. I could have kept the files inside the tornado module's files directory, and they would have been served by the puppetmaster process, but I prefered to use Apache for better performance and scalability. Note that I do this only for relatively large files such as tar.gz archives.

The Exec stanza (note upper case E) defines certain parameters that will be common to all 'exec' actions that follow. In my case, I specify that I only want to log failures, and I also specify the path for the binaries called in the various 'exec' actions -- this is so I don't have to specify that path each and every time I call 'exec' (alternatively, you can specify the full path to each binary that you call).

The next 2 stanzas define files and directories that I want created on the puppet client nodes. Both 'exec' and 'file' are what is called 'types' in Puppet lingo. I first specify that I wanted the directory /opt/tornado created on each node, and by setting 'recurse=>true' I'm saying that the contents of that directory should be taken from a source which in my case is "puppet:///tornado/bin". This translates to a directory called bin which I created under /etc/puppet/modules/tornado/files. The contents of that directory will be copied over via the puppet internal communication protocol to the destination /opt/tornado by each Puppet client node.

The 'package' type that follows specifies the list of packages I want installed on the client nodes. Note that I don't need to specify how I want those packages installed, only what I want installed. Puppet's language is mostly declarative -- you tell Puppet what you want done, and it does it for you, using OS-specific commands that can vary from one client node to another. It so happens in my case that I know my client nodes all run Ubuntu, so I did specify Ubuntu/Debian-specific package names.

Next in my manifest file is a function definition. You can have these definitions inline, or in a separate manifest file. In my case, I declare a function called 'install_pkg' which takes 3 arguments: the package name, any extra arguments to be passed to the installer, and a module name to test the installation with. The function runs the easy_install command via the 'exec' type, but only if the specified module wasn't already installed on the system.

A paranthesis: the Puppet docs don't recommend the overuse of the 'exec' type, because it strays away from the declarative nature of the Puppet language. With exec, you specifically tell the remote node how to run a specific command, not merely what to do. I find myself using exec very heavily though. I means that I don't grokk Puppet fully yet, but it also means that Puppet doesn't have enough native types yet that can hide OS-specific commands.

One important thing to keep in mind is that for every exec action that you write, you need to specify a condition which becomes true after the successful completion of the action. Otherwise exec will be called each and every time the manifest will be inspected by the puppet nodes. Examples of such conditions:
  • 'creates' -- specifies a file or directory that gets created by the exec action; if the file or directory is already there, exec won't be called
  • 'unless' -- specifies a condition that, if true, results in exec not being called. In my case, this condition is the import of a given Python module, but it can be any shell command that returns 0
Another thing to note in the exec action is the 'require' parameter. You'll find yourself using 'require' over and over again. It is a critical component of Puppet manifests, and it is so important because it allows you to order the actions in the manifest. Without it, actions would be executed in random order, which is most likely something you don't want. In my function definition, I require the existence of the package python-setuptools, and I do it because I need the easy_install command to be present on the remote node.

After defining the function 'install_pkg', I call it 3 times, with various parameters, thus installing 3 Python packages -- virtualenv, boto and grizzled. Note that the syntax for calling a function is funky; it's one of the many things I don't necessarily like about Puppet, but it's an evil you learn to deal with.

Next up in my manifest file is a case statement based on the $architecture variable. Puppet makes several such variables available to your manifests, based on facts gathered from the remote nodes via Facter (which comes with Puppet).

Moving along, we have a package definition, a file definition -- both should be familiar by now -- followed by 3 exec actions:
  • InstallOracleInstantclient performs the download and unpacking of this package, followed by some ldconfig incantations to actually make it work
  • InstallCxOracle downloads and installs the cx_Oracle Python package (not a trivial feat at all in and of itself); note that for this action, the require parameter contains Package["libaio-dev"], Package["gcc"], Exec["InstallOracleInstantclient"] -- so we're saying that these 2 packages, and the Instantclient Oracle libraries need to be installed before attempting to even install cx_Oracle
  • InstallTornado -- pretty self-explanatory, with the observation that the require parameter again points to a directory and several packages that need to be on the remote node before the installation of Tornado is attempted
Whew. Nobody said Puppet is easy. But let me tell you, when you get everything working smoothly (after much pulling of hair), it's a great feeling to let a node 'phone home' to the puppetmaster server and configure itself unattended in a matter of minutes. It's worth the effort and the pain.

One more thing here: once you have a module with manifests and files defined properly, you need to define the set of nodes that this module will apply to. The way I do it is to have the following files on the puppet master, in /etc/puppet/manifests:

1) A file called modules.pp which imports the modules I have defined, for example:
import "common" 
import "tornado"
('common' can be a module where you specify actions that are common across all types of nodes)

2) A file called nodetemplates.pp which contains definitions for 'node templates', i.e. classes of nodes that have the same composition in terms of modules they import and actions they perform. For example:
node basenode {
    include common
}

node default inherits basenode {
}

node webserver inherits basenode {
    include scribe
    include apache2
    $required_apache2_modules = ["rewrite", "proxy", "proxy_http", "proxy_balancer", "deflate", "headers", "expires"]
    apache2::module {
        $required_apache2_modules:
        ensure => 'present',
    }
    include tomcat
    include tornado
}

Here I defined 3 types of nodes: basenode (which includes the 'common' module), default (which applies to any machine not associated with a specific node definition) and webserver (which includes modules such as apache2, tomcat, tornado, and also requires that certain apache modules be enabled).

3) A file called nodes.pp which maps actual machine names of the Puppet clients to node template definitions. For example:
node "web1.mydomain.com" inherits webserver {}
4) A file called site.pp which ties together all these other files. It contains:
import "modules"
import "nodetemplates"
import "nodes" 

Much more documentation on node definition and node inheritance can be found on the Puppet wiki, especially in the Language Tutorial.

Fabric examples

In comparison with Puppet, Fabric is a breeze. I wanted to live on the cutting edge, so I installed the latest version (alpha, pre-1.0) from github via:

git clone git://github.com/bitprophet/fabric.git

I also easy_install'ed paramiko, which at this time brings down paramiko-1.7.6 (the Fabric documentation warns against using 1.7.5, but I assume 1.7.6 is OK).

Then I proceeded to create a so-called 'fabfile', which is a Python module containing fabric-specific functions. Here is a fragment of a file I called fab_nginx.py:

from __future__ import with_statement
import os
from fabric.api import *
from fabric.contrib.files import comment, sed

# Globals

env.user = 'myuser'
env.password = 'mypass'
env.nginx_conf_dir = '/usr/local/nginx/conf'
env.nginx_conf_file = '%(nginx_conf_dir)s/nginx.conf' % env

# Environments


def prod():
"""Nginx production environment."""
env.hosts = ['nginx1', 'nginx2']

def test():
"""Nginx test environment."""
env.hosts = ['nginx3']

# Tasks

def disable_server_in_lb(hostname):
require('hosts', provided_by=[nginx,nginxtest])
comment(env.nginx_conf_file, "server %s" % hostname, use_sudo=True)
restart_nginx()

def enable_server_in_lb(hostname):
require('hosts', provided_by=[nginx,nginxtest])
sed(env.nginx_conf_file, "#server %s" % hostname, "server %s" % hostname, use_sudo=True)
restart_nginx()

def restart_nginx():
require('hosts', provided_by=[nginx,nginxtest])
sudo('/etc/init.d/nginx restart')
is_nginx_running()

def is_nginx_running(warn_only=False):
with settings(warn_only=warn_only):
output = run('ps -def|grep nginx|grep -v grep')
if warn_only:
print 'output:', output
print 'failed:', output.failed
print 'return_code:', output.return_code

Note that in its 0.9 and later versions, Fabric uses the 'env' environment dictionary for configuration purposes (it used to be called 'config' pre-0.9).

My file starts by defining or assigning global env configuration variables, for example env.user and env.password (which are special pre-defined variables that I assign to, and which are used by Fabric when connecting to remote hosts via the ssh functionality provided by paramiko). I also define my own variables, for example env.nginx_conf_dir and env.nginx_conf_file. This makes it easy to pass the env dictionary as a whole when I need to format a string. Here's an example from another fab file:

cmd = 'mv -f %(crt_egg)s %(backup_dir)s' % env

I then have 2 function definitions in my fab file: one called prod, which sets env.hosts to a list of production nginx servers, and one called test, which does the same but sets env.hosts to test nginx servers.

Next I have the actions or tasks that I want performed on the remote hosts. Note the require function (similar in a way to the parameter used in Puppet manifests), which says that the function will only be executed if the given variable in the env dictionary has been assigned to (in my case, the variable is hosts, and I require that the value need to have been provided by either the prod or the test function). This is a useful mechanism to ensure that certain things have been defined before attempting to run commands on the remote servers.

The first task is called disable_server_in_lb. It takes a host name as a parameter, which is the server that I want disabled in the nginx configuration file. I use the handy 'comment' function available in fabric.contrib.files to comment out the lines that contain 'server HOSTNAME' in the nginx configuration. The comment function can be invoked with sudo rights on the remote host by passing use_sudo=True.

The task also calls another function defined in my fab file, restart_nginx. This taks simply calls '/etc/init.d/nginx restart' on the remote host, then verifies that nginx is running by calling is_nginx_running.

By default, when running a command on the remote host, if the command returns a non-zero code, it is considered to have failed by Fabric, and execution stops. In most cases, this is exactly what you want. In case you just want to run a command to get the output, and you don't care if it fails, you can set warn_only=True before running the command. I show an example if this in the is_nginx_running function.

The other main task in my fabfile is enable_server_in_lb. Here I use another handy function offered by Fabric -- the sed function. I substitute '#server  HOSTNAME' with 'server HOSTNAME' in the nginx configuration file, then I restart nginx.
So now that we have the fabfile, how do we actually perform the tasks we defined? Let's assume we have a server called 'web1.mydomain.com' that we want disabled in nginx. We want to test our task first in a test environment, so we would call:
fab -f fab_nginx.py test disable_server_in_lb:web1.mydomain.com
(note the syntax for passing parameters to a function/task)

By specifying test on the command line before specifying the task, I ensure that Fabric first calls the function named 'test' in the fabfile, which sets the hosts to the test nginx servers.

Once I'm satisfied that this works well in the test environment, I call:

fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com

For a real deployment procedure, let's say for deploying tornado-based servers that are behind one or more nginx load balancer, I would do something like this:

fab -f fab_nginx.py prod disable_server_in_lb:web1.mydomain.com
fab -f fab_tornado.py prod deploy
fab -f fab_nginx.py prod enable_server_in_lb:web1.mydomain.com

This will deploy my new application code to web1.mydomain.com. Of course I can script this and call the above sequence for all my production servers. I assume here that I have another fabfile called fab_tornado.py and a task defined in in which does the actual deployment of the application code (most likely by downloading and easy_install'ing an egg).

That's it for today. It's been more like a whirlwind through two types of automated deployment tools -- Puppet/pull and Fabric/push. I didn't do justice to either of these tools in terms of their full capabilities, but I hope this will still be useful for some people as a starting point into their own explorations.
[edit: nixed a paragraph by request]
[edit: link to accepted talks and thank Doug]

I want to say three things in this post. One is that the PyCon program committee has finished making there decisions of which 95 talks out of the 179 submissions we received. I should also admit that my talk (#9 in the list) was accepted although I actually missed the part of PC meetings where it was so I didn't officially find out until everyone else did.

Second, I wanted to publicly thank Jesse Noller and the other members of the pycon-pc for their hard work (with special honorable mention to Doug Napoleone for pycon-tech, the software that keeps PyCon running). We had a record number of submissions with an average of only 10 people in IRC to go over them during the culling process. It was grueling, but people stuck in there and helped make it happen (and I admit I did not participate as much as I would have liked due to a conference paper submission I had).

Third, I want to address the negativity that has been popping up about the decisions we had to make (some of which led to hate mail sent personally to Jesse which is completely uncalled for). There seems to be two themes that have popped up as to why people are upset over there rejection.

One is their talk received all positive reviews but was still rejected. Sorry, but that's called time restriction. We honestly had more talks with all positive reviews than we had slots. And just because your talk didn't receive any negative reviews does not mean that it fired anyone up enough to want to stand up for it. We use a Champion voting system where a talk only gets considered if a reviewer is willing to take a stand saying they will fight for the talk to be included. If a talk has a lot of people standing up for it then it will get in without question, but it takes A LOT of people for that to happen (like four or more, and there were not that many talks like that). Otherwise we have to discuss the talk. And if the people fighting for the talk can't convince the PC (or don't make the IRC meeting which is just life since we can't infinitely postpone to make everyone's schedule) then the talk is let go. So if your talk received all positive reviews and you are wondering what it was rejected in the end, it means it unfortunately didn't attract a champion that was either to argue for it in a way to win over the other PC members in IRC.

The second theme has been how the talks are not anonymous to reviewers. This is a conscious decision that we have made from years of experience where it matters whether we know upfront whether someone is a good speaker. We tried anonymous reviewing one year and it turned out badly. PyCon does not have a proceedings like academic computer science conferences where if someone can't present you still have the work in paper form. At PyCon if you are a good speaker that counts for a lot as speaking is how you share your knowledge. And if people think you are not a great speaker that usually means someone won't champion your talk, not that you will get a -0 or -1 (if that does happen it is usually a sign of a bad review, but once again we were short on people and some bad reviews slipped through).
I've always regarded working full time on Jython at Sun as a miraculous sabbatical that might come to an end at any moment. Sadly that time has come. I've worked with many amazing people and had a great time: Sun provided me with one of the greatest professional experiences I have had, for which I will always be grateful.

I'm very pleased with how far Jython has come during my tenure at Sun. Jython is now a modern version of Python, and has the momentum to continue its growth. A far larger group of developers than ever before contribute regularly, making Jython a very healthy project. Jython runs many more of the key frameworks and applications that are popular in the Python world. In the future we will be making Jython better, faster, and more complete. I started working on Jython long before I joined Sun, and I certainly plan be a part of Jython's future.

I am looking for a new opportunity, and am open to many possibilities. I have been working in software for more than twelve years, often in a lead role. I am well regarded in the Open Source world, where I have participated in and helped build communities. I have been the Jython project lead for almost five years. I am a committer on the Python project, and a member of the Python Software Foundation. I have done a wide variety of work in recent history: development and leadership work in Java and Python, from the abstract level of parsing Python source and compiling to Java bytecodes, to the more concrete work of web development. I am able to do work in any of these areas, on either a full time or consulting/contracting basis. My contact information can be found on my Google profile and my LinkedIn profile is a good summary of my credentials.

Regular readers will know that the Foundation periodically honors those who have made significant contributions to its mission. Often these people aren't even members of the Foundation, but this doesn't exclude them. At its recent meeting the PSF Board voted Community Awards to two people, one of whom isn't currently a member.

Noufal Ibrahim Noufal was nominated for heading up the organizing team for the recent (and very successful) first PyCon India conference held on September 26 and 27 in Bangalore, attracting 450 delegates. Although Noufal was "first among equals" this award also recognizes the work of everyone who helped to make the inaugural conference so successful.

Barry Warsaw Many people are unaware of the huge volume of mail that is processed by software written in Python every hour of every day. This is because they don't know about the Mailman project, which was Barry's brainchild. Barry, a founder member of the Foundation, also acted as release manager for several recent Python releases.

The Foundation is grateful to Noufal and Barry for their efforts, each of which helps to promote Python's popularity and increase the Python community as a whole.

Jean-Paul Calderone continues his excellent "Twisted Web In 60 Seconds" tutorial series.  If you haven't checked it out yet, you should!
I've been pretty busy for the last couple of weeks, so I've just had an opportunity to catch up with blog posts that have been piling up.  In particular I noticed this one: The “WiFi At Conferences” Problem, by Joel Spolsky.

Joel has a lot of what look like good recommendations.  However, I can provide a much-abridged list.

Some years, WiFi access at PyCon US has been provided by the venue, or by a contractor whose name I mercifully do not know.  Those years, it has not worked.  Some years, it has been provided, or at least managed, by tummy.com.  Those years, it has worked.  They are probably much more critical of their own efforts than I am, as you can see in this thorough write-up that they did of PyCon's 2008 WiFi situation.

My two-step plan for you if you want your conference to have working WiFi access at your conference is:
  1. e-mail somebody at tummy.com, telling them that you want a working wireless network, and
  2. give them whatever they ask for.
If you do these things, then when people open their laptops at your conference, their networks will work.
The PyCon organizers are trying something new for PyCon 2010: posters. Basically there is a plenary session where people have posters about something and they stand there talking to whomever comes by (if you have been in academia you know what these can be like). In my time doing posters at academic conferences I have found that at the larger conferences a poster can be worth your time. You end up with enough people coming by talking to you that you get valuable feedback and contacts. Plus a poster is a lot less stressful to do compared to a full talk and allows for more detail than a lightning talk.

So if you have something you want to say that can be put on a poster and want some feedback on it or simply share with the PyCon attendees, please consider making a poster.

http://www.youtube.com/watch?v=sdGfh8a98jw

"Optimum" is for machines, not organisms.

I had this grand plan for a blog post comparing the various JavaScript GUI libraries, which in the end ended up being this blog post stating how I like jQuery UI. If you don't care about reading a "love letter" to jQuery UI, then you can stop reading now.

Since I was planning on refreshing the old online UI for Oplop I figured this was as good a chance as any to see what the various libraries offered. I knew that I wanted a wizard-like interface for Oplop's pasword hash generation workflow where users couldn't skip steps (see the FAQ if you want to know how Oplop works). This led me to think that something like an iPhone or accordion interface would work best as long as I could easily control when the next step in the workflow was taken.

Since I already loved jQuery, I decided to start with jQuery UI. To get my wizard-like interface I decided to use their accordion. It was nice that the only two requirements placed upon my HTML to work with the accordion was that the tag representing the header must contain an anchor tag and that the body tag follows immediately after the header tag. Otherwise the header tag could be anything I wanted as could the tag containing the body of a section. This allowed me to use an h2 tag with an embedded anchor tag for the header, a div tag for the body, and contain both as a unit in section tags w/o any ill effects.

The biggest worry I had going into this was how to control the workflow. I needed to make sure people didn't skip steps. Plus I wanted a fixed order to the steps for unfounded, paranoid security reasons (didn't want someone leaving their password entered longer than necessary). Plus it limits me having to worry about the state of the web page at all times. Since the accordion widget from jQuery UI is designed to rely on clicks I was not sure if I was going to reasonably be able to turn off the ability to click on a section.

Turns out I worried for nothing. The accordion widget has an 'event' option that lets you set what kind of event triggers the section switch. I am sure this was mostly done to let people choose onclick or onmouseover, but I chose the empty string and that disabled the ability to perform a GUI interaction to switch.

At this point I had a working prototype of the UI. But since it took no time at all to get working I just kept working and managed to finish up the basic work in a night. In the end jQuery UI simply worked. I didn't have to learn any funky new way of creating widgets or nasty work-arounds to get my abnormal requirements met. Nor did I have to code my JavaScript or HTML in a way I didn't want to other than tossing in some anchor tags (which I could even programmatically insert if I felt like it). I simply was able to get my prototype up and fully functioning.

At that point I started to look at what other JavaScript GUI libraries to evaluate. I considered GWT but they had no clear way to disable clicking on their stack panel. I then thought about Dojo and its dijit.layout.StackContainer since I had to learn it for a research paper I am writing, but their lack of up-to-date documentation on their 1.3 API is a real turn-off to me; it seems Dojo does most of their documentation through SitePen's blog or DojoCampus.

And then it struck me: this was beginning to feel like a chore. I realized I was researching these other GUI libraries purely for this blog post, not because I want to. jQuery UI provided exactly what I wanted in a simple manner that tied into how I preferred to code JavaScript. And so I stopped looking for other solutions.

I am happy with how Oplop Online turned out. I got the workflow I want and a non-ugly fashion that didn't take me days to get working. Now I just need to write the unit tests, but that will have to wait until next month.

So, life has been eventful lately. There was DjangoCon, which was awesome even though I came away deeply unhappy with how my talk turned out; due to a lot of hectic things going on, it fell far below the standard I usually like to enforce for myself. I’ve got a couple things cooking for PyCon, though, which will hopefully make up for it. Things are starting to ramp up for the Django 1.2 ...

Read full entry and comments

Thierry Carrez, who works in the Ubuntu Server team, has a great series of blog posts on how to run your own Ubuntu Enterprise Cloud. I haven't had a chance to  try this yet, but it's high on my TODO list. Thierry uses the Ubuntu Enterprise Cloud product (which has been part of Ubuntu server starting with 9.04) together with Eucalyptus. Here are the links to Thierry's posts:
I went to the Hadoop World conference last week and one thing I took away was how Facebook and other companies handle the problem of scalable logging within their infrastructure. The solution found by Facebook was to write their own logging server software called Scribe (more details on the FB blog).

Scribe is mentioned in one of the best presentations I attended at the conference -- 'Hadoop and Hive Development at Facebook' by Dhruba Borthakur and Zheng Shao. If you look at page 4, you'll see the enormity of the situation they're facing: 4 TB of compressed data (mostly logs) handled every day, and 135 TB of compressed data scanned every day. All this goes through Scribe, so that gives me a warm fuzzy feeling that it's indeed scalable and robust. For more details on Scribe, see the wiki page of the project. It's my intention here to detail the steps needed for compiling and installing it, since I found that to be a non-trivial process to say the least. I'm glad Facebook open-sourced Scribe, but its packaging could have been a bit more straightforward. Anyway, here's what I did to get it to run. I followed roughly the same steps on Ubuntu and on Gentoo.

1) Install pre-requisite packages

On Ubuntu, I had to install the following packages via apt-get: g++, make, build-essential, flex, bison, libtool, mono-gmcs, libevent-dev.

2) Install the boost libraries

Very important: scribe needs boost 1.36 or newer, so make sure you don't have older boost libraries already installed. If you install libboost-* in Ubuntu, it tries to bring down 1.34 or 1.35, which will NOT work with scribe. If you have libboost-* already installed, you need to uninstall them. Now. Trust me, I spent several hours pulling my hair on this one.

- download the latest boost source code from SourceForge (I got boost 1.40 from here)

- untar it, then cd into the boost directory and run:

$ ./boostrap.sh
$ ./bjam
$ sudo ./bjam install

3) Install thrift and fb303

- get thrift source code with git, compile and install:

$ git clone git://git.thrift-rpc.org/thrift.git
$ cd thrift
$ ./bootstrap.sh
$ ./configure
$ make
$ sudo make install

- compile and install the Facebook fb303 library:

$ cd contrib/fb303
$ ./bootstrap.sh
$ make
$ sudo make install

- install the Python modules for thrift and fb303:

$ cd TOP THRIFT DIRECTORY
$ cd lib/py
$ sudo python setup.py install
$ cd TOP THRIFT DIRECTORY
$ cd contrib/fb303/py
$ sudo python setup.py install

To check that the python modules have been installed properly, run:

$ python -c 'import thrift' ; python -c 'import fb303'

4) Install Scribe

- download latest source code from SourceForge (I got it from here)

- untar, then run:

$ cd scribe
$ ./bootstrap.sh
$ make
$ sudo make install
$ sudo ldconfig (this is necessary so that the boost shared libraries are loaded)

- install Python modules for scribe:

$ cd lib/py
$ sudo python setup.py install

- to test that scribed (the scribe server process) was installed correctly, just run 'scribed' at a command line; you shouldn't get any errors
- to test that the scribe Python module was installed correctly, run
$ python -c 'import scribe'

5) Initial Scribe configuration

- create configuration directory -- in my case I created /etc/scribe
- copy one of the example config files from TOP_SCRIBE_DIRECTORY/examples/example*conf to /etc/scribe/scribe.conf -- a good one to start with is example1.conf
- edit /etc/scribe/scribe.conf and replace file_path (which points to /tmp) to a location more suitable for your system
- you may also want to replace max_size, which dictates how big the local files can be before they're rotated (by default it's 1 MB, which is too small -- I set it to 100 MB)
- run scribed either with nohup or in a screen session (it doesn't seem to have a daemon mode):

$ scribed -c /etc/scribe/scribe.conf

6) Test run

To test Scribe, you can install it on a remote machine, configure scribed on that machine to use a configuration file similar to examples/example2client.conf, then change remote_host in the config file to point to the central scribe server configured in step 5.

Once scribed is configured and running on the remote machine, you can test it with a nice utility written by Silas Sewell, called scribe_pipe. For example, you can pipe an Apache log file from the remote machine to the central scribe server by running:

cat apache_access_log | ./scribe_pipe apache.access

On the scribe server, you should see at this point a directory called apache.access under the main file_path directory, and files called apache.access_00000, apache.access_00001 etc (in chunks of max_size bytes).

I'll post separately about actually using Scribe in production. I hope this post will at least get you started on using Scribe and save you some headaches during its installation process.
Great post from Brandon Burton, my ex-colleague at RIS/Reliam, on why automation is the foundation of cloud computing. Brandon discusses automation at various levels, starting with virtualization and networking, then moving up the layers and covering OS, configuration management and application deployment. Highly recommended.
[edit: mention how the fix in Python 2.6.3 may have not belonged in 2.6.3 since it is a micro release]

Python 2.6.3 has a couple of bugs still lingering that warrant a brown bag 2.6.4 release (should be out before the end of the month). One of the "bugs", and the entire reason I am doing this blog post, involves distutils and setuptools. Turns out that for Python 2.6.3 a change was made to distutils that broke setuptools for building extension modules. While the change that broke setuptools is being viewed as improper for a micro release (code shouldn't break in a micro release unless it was really bad semantics being fixed), it did bring to my attention that a lot of people do not know about Distribute.

A problem is that setuptools is no longer maintained. Luckily there is already a solution to this predicament that the wider Python community might not be fully aware of.  Tarek Ziadé forked setuptools and created Distribute, with its first release two months ago, explicitly to provide a library compatible with setuptools that is being actively maintained (and thus has bug fixes). Distribute is a drop-in replacement for setuptools, complete with being able to import it under the setuptools name so that everything will continue to work as if you had setuptools itself installed sans some bugs.

Because Distribute fixes bugs and is a backwards-compatible drop-in replacement I HIGHLY encourage people to change their installs of setuptools to Distribute. Everything will continue to work as Distribute installs itself under the setuptools name to maintain backwards-compatibility. Gentoo has even switched. Plus Distribute 0.6.3 supports Python 3.

So please, if you use setuptools then upgrade to Distribute.
I like Terry Jones; I think FluidDB has a lot of potential.  But, sometimes when he's talking about it, he gets a little carried away and forgets that the rest of us don't live in his future yet.  In his latest missive on the official FluidDB blog, "Digital Hobgoblins", he describes some of the problems that FluidDB sets out to solve.

The problem is, I already have solutions for all of these problems, and I don't quite understand why they don't (or shouldn't) work for me.  (Since he organizes the post in terms of problems that existing systems have, I'm going to take the liberty of re-labeling these in terms of the problems that he seems to be describing rather than the lead text he used.  Please post a comment if you think my labeling is wrong.)

In existing systems, Terry says:

"Things must be named, and have one name."  Specifically, Terry calls out file systems.  Except... file systems have lots of ways of introducing multiple names for the same thing.  Symbolic links.  Hard links, if you really want to allow for ambiguity.  If you want to track that ambiguity, Windows "shortcuts" and MacOS "aliases" can do that.  Overlay mounts, loopback mounts and chroot execution allow for semi-arbitrary renaming.  Lots of other systems support this, too.  Database systems have a specific provision for multiple names: the many-to-one relation.  Any programming language with pass-by-reference data structures allows for some level of multiple-naming.  In fact, there's a whole discipline for allowing things to have lots of different names: indexing.  Anywhere you have a full-text index or an object where multiple attributes are indexed in some kind of database, you've got objects with more than one name.

"You have to be consistent and unambiguous."  As I mentioned on the first point, there are lots of ways to be slightly ambiguous at a human level.  You can refer to the same thing by different names, or, with mutable binding, you can refer to the different things with the same name.  In some circumstances, you must be precise, but that's because fundamentally, algorithmic thinking requries a certain level of precision, not because of any specific problem with computers.  In fact, there is a word for inconsistency and ambiguity in programming languages: polymorphism.  Any time you invoke an interface rather than a concrete implementation (which is to say any time you do anything in a dynamic language like Python) you are being ambiguous and potentially inconsistent in your program's behavior.

"You only get one way to organize stuff."  This is a pretty weak point, though, given that Terry himself immediately turns around and notices that tagging and other multiply-indexed database systems are becoming popular.  So he gives us two examples of exceptions, but no examples of the rule.  I'm not sure what I could add to that.

"Programmers are obsessed with "meaning"."  On this one, I'm going to agree, except I don't think it's a problem.  In the computational world, we are obsessed with the meaning of data, because if you get the meaning of the inputs wrong, then the meaning of the outputs is wrong too.  For example: if you have a number that represents the total liabilities that your company has accumulated, it's pretty important that you don't ever treat that as your total profit.  At a deeper level, if you have a sequence of bits that represents a floating-point number, it's important to know about its intended meaning, and not treat it as a string of characters, unless what you really want is a string.  "@H=N" is not as useful a concept as "3.1287417411804199" if you are trying to add it to something.  For what it's worth, I have my own, similar take on how we should treat computational objects that have multiple meanings: Imaginary. Even systems like Imaginary and FluidDB depend on a very rigid definition of some simpler concepts, like numbers consistently being numbers and words consistently being words.  In my view, even if we treat the book itself as multifaceted, it's important to know what the data representing the "readable object" part of a book is really "about", and make sure it stays distinct from the data representing the "paperweight" part of the book.  To be fair, FluidDB appears to do this itself — and this terminology is my least-favorite part of FluidDB — by having single-purpose, permission-controlled "objects" just like every other system, but calling them "tags", and re-using the word "objects" to refer instead to what others might call a "UUID" or "central index".  In Imaginary, the system is similar; although the centrality of the FluidDB "object" (in Imaginary's case, the "Thing") is less stark; using FluidDB's terminology, in Imaginary, a "tag" can have a "tag" of its own; in fact, there's nothing but tags ("Items") anywhere.

"Metadata is separated from the data it describes."  This may be true in some systems, but the web is probably the system with the most data in it anywhere, and in that system, metadata is always available as part of the request and the response.  You can put in any headers you want in the response, and there are lots of pieces of metadata (like content-type) which are almost always found along with the data.  In my opinion, the problem is more that we don't have enough of the previous problem.  Web developers haven't been obsessed enough with meaning: there aren't enough useful conventions around the HTTP request/response metadata, and so it's hard to bundle more metadata in with your response and have it faithfully propagated elsewhere.  We don't know what arbitrary headers might mean, because we don't have any way of expressing a schema for them.

Terry says he's going to write more about these problems, and the solutions that FluidDB provides for them.  I'm looking forward to it.  As part of that, I'd really like to see a clear description of how these problems affect me, or someone I know, either as a programmer or as a user.  What do I, or should I, really want to do with some application right now that these five problems are preventing me from doing?

The reason I felt compelled to write about this is that history — and particularly the history of websites like freshmeat and sourceforge — is littered with the corpses of projects which promised to fundamentally change the way we represent data.  A common problem with these projects is that they have expansive denunciations of current techniques to represent data, or manage persistence, and claim to provide an advance so significant that they will displace all current applications.  What most of the people working on these projects don't realize is that the current techniques for representing data have a history, and there are good reasons for their limitations.  Granted, not all of those reasons are currently relevant, and many are examples of path dependence, but it's still important to understand the reasons in order to escape the problems.

In FluidDB's case, I think that the problem isn't so much that Terry doesn't have the historical perspective, but that he assumes that we all do.  And that we can all make the cognitive leap to see why FluidDB is necessary.  But if I can't do it, I have to assume there are at least a few other programmers who aren't getting the message either.

When I wrote PEP 352: Required Superclass for Exceptions one of the hopes I had was it would lead to some sanity when it came to attaching information to exceptions. As it stands now everything is stored under BaseException.args which is not exactly structured, and so I wanted to clean that situation up. Unfortunately this got derailed because of backwards-compatibility issues (and for those of you being bitten by the BaseException.message deprecation, Python 2.6.1 has a fix to make it saner).

But today a tweet from Jacob Kaplan-Moss reminded me why exceptions need to be cleaned up. If you look at the constructor for BaseException it is essentially:
def __init__(self, *args):
  self.args = args

Not exactly fancy, but this loose generality has a price in that most people simply toss in all potentially useful information into the exception constructor without providing a way to get any information out of it in a reasonable way. For instance, if you have arguments 2 and 3 contain useful information why do I have to know what index provides what info instead of using a descriptive attribute name? If I can't use dir() on an exception to figure out what useful metadata is on an exception then there is a problem. There's a reason named tuples came into existence; indexes are not self-documenting.

There is also a nastier side-effect for exceptions given multiple arguments in regards to how BaseException.__str__() acts. If args has exactly a single value then the string for the exception is str(args[0]). But if args has multiple values then the string of the exception is str(args). That can lead people not tacking on any information to make sure their exceptions have a nice, clean string representation. That's what leads people like JKM having to parse data out of an exception's string. It's just ludicrous for anyone to have to parse an exception to get information that was obviously available when the message was created!

I see two solutions to this predicament, and both involve changing the constructor to BaseException. Let's look at each in turn and use IndexError as an example of how things might improve where we attach to the exception what index was out of what range. So one option is to simply change BaseException to only accept a single argument and have that be bound to message and be what the string representation is:
class BaseException:

    def __init__(self, message=''):
        self.message = message

    def __str__(self):
        return str(self.message)

That would motivate me to change IndexError to:
class IndexError(Exception):

    def __init__(self, message="index out of range", *, index=None, range=None):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message.format(index=index, range=range))
        self.index = index
        self.range = range

By having BaseException accept only a single argument people who typically just toss metadata about why the exception is being raised are forced to actually construct a message. The hope is that if someone is being force to construct a message for an exception on their own they will simply tack the information on to the instance.

But what about those exception authors who are fine creating a message but still don't bother to tack the metadata on to the exception? That brings up the other possible approach to BaseException where it does string interpolation for you based on keyword arguments you pass in:
class BaseException:

    def __init__(self, message, **kwargs):
        self.message = message.format(**kwargs)
        self.__dict__.update(kwargs)

    def __str__(self):
        return str(self.message)

With IndexError we can now have a couple of options. One is to simply subclass and hope that most people will simply do IndexError("index {index} out of range", index=42) in all instances. The other option is to take the approach shown above but cut out some code that is no longer needed:
class IndexError(Exception):

    def __init__(self, message="index is out of range", *, index=None, range=None, **kwargs):
        if index is not None:
            message = "index {index} out of range"
            if range is not None:
                message += " {range}, exclusive"
        super().__init__(message, index=index, range=range, **kwargs)

This approach has the nice effect of promoting people to simply let BaseException construct the message string for the user while providing the nice side-effect of also storing the data on the exception using descriptive attribute names. It also allows for easy arbitrary metadata through kwargs more than the previous approach where you would have to explicit take kwargs and then update the instance.

Regardless of the approach, the real trick is how would you transition over to it? First thing would be to introduce a pending deprecation for BaseException taking more than a single argument. Next would be to activate the new semantics with keyword arguments where if more than a single positional argument is given an exception is thrown. After a certain amount of time

But the real trick for a transition is message. Do you only set it for exceptions that have transitioned? If you do set it for exceptions that are still passing in multiple positional arguments do you set the attribute to the first argument or the string representation for all of them? Or do you simply skip having message and just let people call str() on exceptions to get what the message would be, like so?
class BaseException:

    def __init__(self, message, **kwargs):
        self.__message = message
        self.__dict__.update(kwargs)

    def __str__(self):
        return self.__message.format(**self.__dict__)

I honestly don't know what the best solution would be. Some people would probably complain about not being able to introspect on the message attribute, but exposing a string format seems somewhat icky to me. Either way some solution would be available.

Who knows, maybe some day I will try to push this into Python and finish what I had originally intended to do with PEP 352.
On behalf of the Jython development team, I'm pleased to announce that Jython 2.5.1 final is available for download. See the installation instructions.

Jython 2.5.1 fixes a number of bugs, including some major errors when using coroutines and when using relative imports, as well as a potential data loss bug when writing to files in append mode. Please see the NEWS file for detailed release notes.

Please report any bugs that you find. Thanks!
October 1, the deadline for submitting PyCon talk proposals, is now just under a week away. If you're planning to present, you should submit your outline now! See the proposal instructions for guidance.
Any report of the death of the Pybots project is an exaggeration. But not by much. First, some history.

Some history

The idea behind the Pybots project is to allow people to run automated tests for their Python projects, while using Python binaries built from the very latest source code from the Python subversion repository.

The idea originated from Glyph, of Twisted fame. He sent out a message to the python-dev mailing list in which he said:
"I would like to propose, although I certainly don't have time to implement, a program by which Python-using projects could contribute buildslaves which would run their projects' tests with the latest Python trunk. This would provide two useful incentives: Python code would gain a reputation as generally well-tested (since there is a direct incentive to write tests for your project: get notified when core python changes might break it), and the core developers would have instant feedback when a "small" change breaks more code than it was expected to."

This was back in July 2006. I volunteered to maintain a buildbot master (running on a server belonging to the PSF) and also to rally a community of people interested in running this type of tests. The hard part was (and still is) to find people willing to donate client machines to act as build slaves for a particular project, and even more so people willing to keep up with the status of their build slaves. The danger here, as in any continuous integratin system, is that once the status turns to red and doesn't go back to green, people start to ignore the failed steps. Even if those steps exhibit new and interesting failures, it's too late at this point (this is related to the broken windows theory).

The project starting fairly strong, gained some momentum, but then slowly ran out of steam. It was a combination of me not having the time to do the rallying, and of people not being interested in participating in the project anymore. At the height of its momentum, in early 2007, the Pybots farm consisted of 11 buildslaves running automated tests for more than 20 Python projects, including Twisted, Django, SQLAlchemy, MySQLdb, Bazaar, nose, twill, Storm, Trac, CherryPy, Genshi, Roundup. Pretty much a who's who of the Python project world.

Early success stories

Here are some examples of bugs discovered by the buildslaves in the Python farm:
  • new keywords 'as' and 'with' in Python 2.6 causing problems for projects that had variables with those names
  • Python install step failing even though all unit tests were passing (this underscores the importance of functional testing)
  • platform-specific issues -- for example Bazaar issues on Windows due to TCP client behavior, Twisted issues on Red Hat 9 due to multicast behavior, Python core issues on OS X due to string formatting errors
(for a more thorough overview of the Pybots project, including lessons learned, see also my PyCon07 presentation)

Recent signs of life and more success stories

In the last month or so there has been a flurry of activity related to the Pybots farm. It all started with an upgrade of the buildbot version on the machine hosting the Pybots buildmaster. This broke the master's configuration file, so the Pybots status page went completely dark.

As a result, Steve Holden posted a plea for help answered by a few people who showed interest in adding build slaves to the project. In parallel, Jean-Paul Calderone jumped in to help on the buildmaster side and he managed to fix the buildbot upgrade issue (thanks, JP!) David Stanek also expressed interest in taking a more active role on the buildmaster side.

Jean-Paul also sent more success stories to the Pybots mailing list. Here they are, verbatim, with his permission:

"The skip story:

The Twisted pybots slave started skipping every Twisted test one day. I noticed and filed http://twistedmatrix.com/trac/ticket/3703 (which goes into a bit of detail about why this happened). This happened to come up during the PyCon language summit, so there was some real-time discussion about it, resulting in a Python bug being filed, http://bugs.python.org/issue5571. Then, as that ticket shows, Benjamin Peterson was nice enough to fix the incompatibility.

The array/buffer story:

The Twisted pybots slave started to fail some Twisted tests one day. ;) The tests in question were actually calling into some PyCrypto code, so this failure wasn't in Twisted directly. PyCrypto loads some bytes into an array.array and then tries to hash them (for some part of its random pool API). I filed http://bugs.python.org/issue6071 on which someone explained that hashlib switched over to the new buffer API, lost support for hashing anything that only provides the old buffer API, and that array.array still only supports the old buffer API. This one hasn't been fixed yet, but it sounds like Gregory Smith plans to fix it before 2.7 is released.

There are other success stories too, incompatible changes that are more like bugs on the Twisted side than on the Python side (assuming one is generous and believes that incompatible changes in Python can actually be Twisted bugs ;). Things like typos that didn't result in syntax errors in an older version of Python but became syntax errors in newer versions (in particular, a variable was defined as 0x+80000000 instead of 0x80000000 - the former actually being valid syntax in 2.5 but became illegal in 2.6)."


My hope is that stories like these will convince more people about the usefulness of running tests for their projects against 'live' changes in the Python trunk (or other Python branches). I am not aware of any other testing project that accomplishes this for other programming languages.

In particular, if there is enough interest, we can also configure the Pybots master to trigger test runs for your project of choice using Py3k binaries! Think how cool you'll appear to your grandchildren!

How you can help

If you want to be involved in the Pybots project, please subscribe to the Pybots mailing list and show your interest by sending a message to the list. Here are some resources to get you started:

Thanks to jamwt for the shout-out on the announcement of Diesel.

Since the reaction to my reaction to tornado was so good (or at least so ... energetic), I figure I should comment on Diesel as well.  Spoiler alert: my reaction is ... largely similar, but since jamwt has been kind of nice to Twisted in the past, and didn't actually say anything mean this time, I'm somewhat reluctant to have that reaction.  Nevertheless, I swore a solemn oath to tell it like it is, keep it real, and soforth.  So I must.

Once again, I'm happy that event-driven programming is getting some love.  This time, I'm pleased that nobody is saying anything especially snarky or FUD-ish about Twisted.  I do feel like it's a little weird not to mention Twisted, or include some comparisons to Nevow or Orbited, both of which provide different, comprehensive approaches to COMET with Twisted.

(Worth noting: Orbited also originally started out using its own event-driven I/O layer, but switched to Twisted later, because Twisted is "crazy delicious".)

Diesel has many more interesting ideas at the level of async I/O than Tornado did.  I think the generator-based approach for implementing protocols is interesting and deserves some more exploration.  I'm not sold on it for every use-case, and I think the implementation might have some flaws, but it definitely has some advantages.

I'd give jamwt a hard time for not reporting issues and communicating with Twisted more before re-writing the core, but for three issues:
  1. jamwt's been around in the Twisted community for a while.  He's written a bunch of fairly deep Twisted code and he clearly knows what the framework is capable of.
  2. I've spoken with him on a number of occasions, and for all I know I might have discussed this with him.  I don't remember it, but it would be pretty embarrassing to write a big rant about how nobody talks to us only to have him paste some chat log where he explained why he was writing Diesel six months ago, and I said "oh, okay" ;-).
  3. Nobody is calling Twisted names or making vague, unsubstantiated accusations.  You're not obligated to examine Twisted, nor Nevow, nor Orbited, I just feel that you owe us some explanation if you publicly say that you tried it and found it wanting.  The tone on the Diesel announcement, in its one brief mention of Twisted, is "we tried it, but we kinda wanted to do our own thing".  So, good for them, they did their own thing, I hope they had fun.
Now, personally, I'd like to leave it at that, but there is a certain inevitable comparison that I think is going to take place.  Diesel has a nicer web page than Twisted.  They have entwittered ... twitified ... uh ... tweetened ... the project, and we haven't; we just have an old-fashioned "blog".  Diesel is smaller than Twisted, so it's easier to explain, and so the people approaching it will have a better idea of its scope.  This might give the immediate impression that it is a simpler, better, more "modern" replacement for Twisted's I/O layer, and this is not the case.  So I still feel it's important that I set the record straight.

Before I launch into my critique, I should say that I don't want to harsh on Diesel too bad. It's a neat little hack and you should go play with it.  And I feel bad pointing out problems with it, since as I mentioned above, nobody's dumping on Twisted.  So, Diesel fans, please take this in the spirit of a frank code-review, not a complaint about your behavior.

The interesting generator-munging bits could be easily adapted to run on top of Twisted's loop, which, arguably, they should have been in the first place; and the toy "hub" that they've written might be good enough for some simple applications where reliability under load is not a serious concern.  In fact, inlineCallbacks might provide a good deal of what is needed to support Diesel's programming style.  Alternately, Diesel might provide some hints as to how things like inlineCallbacks could be made more efficient.

That said, Diesel's I/O loop sucks.

It's disappointing to see the same mistakes getting made over and over again.  First and foremost: no tests.  Come on, Python community!  You can do better!  Write your damn tests first!

The #1 benefit that a brand-new I/O loop project could have over Twisted is that Twisted was written in the bad old days before everybody knew that TDD was the right way to write programs, so we don't have 100% test coverage.  But, we strive to get closer every day, while every new project decides that they don't need no stinking quality control.

Predictably, as it has no tests, Diesel's I/O layer is full of dead code, inaccurate  documentation, and unhandled errors.  Consider this gem, which I found about 30 seconds into reading the code: KqueueEventHub is documented to be "an epoll-based event hub", and its initializer defines an inner function which is never used.  I'm not going to belabor the point by enumerating all the typo bugs I found, but you may find the output of 'pyflakes diesel' interesting.

Instead of Tornado's inaccurate handling of EINTR, Diesel has no handling of EINTR, as far as I can tell.  It also doesn't handle EPERM, ENOBUFS, EMFILE, or even EAGAIN on accept().  To be fair, it has a catch-all exception handler all the way at the top of the stack, so none of these will cause instant crashes, but they will cause surprising behavior in odd situations (and possibly infinite traceback-spewing loops).

More surprisingly - I had to re-read the code about five times to make sure - it doesn't appear that sockets are ever set to be non-blocking, and EAGAIN is not handled from accept(), recv(), or send().  And yes, this can happen even if your multiplexor says your socket is ready for reading and/or writing.  The conditions are somewhat obscure, but nevertheless they do happen.  So, occasionally, Diesel will hiccup and block until some slow network client manages to send or receive some traffic.  In other words: Diesel is not really async.  It just fakes it convincingly, most of the time.

Once again, there's no way to asynchronously spawn a process, and no way to asynchronously connect a TCP client.  Sure, this looks like an asynchronous connect call, but it's misleading: it blocks on resolving the hostname, and it potentially blocks on the initial SYN/ACK/SYN+ACK exchange.  There's no asynchronous SSL support.  And no, that is not trivial.  Not to mention handling all the crazy errors that spew out of the Windows TCP stack.  And since the loop is implemented to be incompatible with Twisted, it's not obviously trivial to compatibly plug it in and get those features.

Again, I don't want to dump on Diesel here; for what it is, i.e. an experiment in how to idiomatically structure asynchronous applications, it's all right.  For that matter Twisted has its fair share of bugs too, which would be pretty easy to lay out in a similar post; you wouldn't even need to do the research yourself, just go look at our bug tracker.

But both Diesel and Tornado make the mistake of attempting to replace the years of trial-and-error, years of testing discipline, and years of portability and feature work that Twisted has accumulated with a few oversimplified, untested hacks.

What they could have done is contributed any extensions that they needed to Twisted's loop, or modifications to Twisted's packaging that would allow them to get a smaller sliver of Twisted's core to bootstrap, if that's what they needed.

My goal in pointing out all these flaws is not to illustrate any particular point about Diesel, but to reinforce the point I implicitly made in my Tornado post, which is that if you try to write a new mainloop (especially without tests) you will screw it up.  You will most likely screw it up in ways which will only surface later, under mysterious circumstances, when your servers are under load and you are under the gun for a deadline.

Or if I happen to get wind of it and write a blog post about it, of course.  Then you get to cheat a little.

It's not an indictment of Diesel that it screwed this up; everyone screws it up.  I would probably screw it up, if I didn't have Twisted sitting in front of me as a direct reference.  POSIX by itself is unreasonably subtle and difficult, but POSIX, plus the subtle variations in different platforms which implement it, plus the Windows APIs which are almost-but-not-quite-exactly-nothing-like the POSIX APIs, presents an inhuman challenge.

Hopefully Diesel will grow some tests.  Hopefully it will fix, or better yet shed, its somewhat unfortunate I/O hub.  I am hopeful that someone will follow Dustin's excellent lead (perhaps Dustin himself!) and port Diesel's API and generator system over to Twisted's I/O architecture and eliminate all these silly bugs.  Of course, it someone did that, you could use Dustin's tornado port with Diesel.

With the silly bugs from the I/O loop out of the way, the Diesel team can write tests for the more interesting pieces, and fix the bugs which aren't entirely silly :-).

My ex-colleague from OpenX, Jeff Roberts, has another great blog post on 'A Scalable DNS Scheme for Amazon's EC2 Cloud'. If you need to deploy an internal DNS infrastructure in EC2, you have to read this post. It's based on battle-tested experience.