Test-Driven Development With Python

Earlier this year, I was approached by the editor of Software Developer's Journal to write a Python-related article. I was quite flattered by the opportunity, but, being extremely busy at the time with work and family life, I was hesitant to agree. However, after much discussion with my wife and other important people in my life, I decided to go for it.

I had a lot of freedom to choose a topic to write about in the article, along with a relatively short timeline. I think I had two weeks to write the article after finally agreeing to do so, and I was supposed to write some 7-10 pages about my chosen topic.

Having recently been converted to the wonders of test-driven development (TDD), I decided that should be my topic. Several of my friends were also interested in getting into TDD, and they were looking for a good, simple way to get their feet wet. I figured the article would be as good a time as any to write up something to help my friends along.

I set out with a pretty grand plan for the article, but as the article progressed, it became obvious that my plan was a bit too grandios for a regular magazine article. I scaled back my plans a bit and continued working on the article. I had to scale back again, and I think one more time before I finally had something that was simple enough to not write a book about.

Well, that didn't exactly turn out as planned either. I ended up writing nearly 40 pages of LibreOffice single-spaced, 12pt Times New Roman worth of TDD stuff. Granted, a fair portion of the article's length is comprised of code snippets and command output.

Anyway, I have permission to repost the article here, and I wanted to do so because I feel that the magazine formatting kinda butchered the formatting I had in mind for my article (and understandably so). To help keep the formatting more pristine, I've turned it into a PDF for anyone who's interested in reading it.

So, without much further ado, here's the article! Feel free to download or print the PDF as well.

Firefox 6.0 Is My Default Browser Again

I've just made Firefox my default browser again (after using Chrome/Chromium as my default for quite some time). FF6 renders things almost as fast as Chrome, even on our ridiculously complicated pages at work. This was my primary reason for not using FF for such a long time.

However, there are other things that I've decided are worth going back to FF at least for the time being. If any of you know of ways to accomplish this stuff with Chrome, please enlighten me!

  • Tab groups. I often have dozens and dozens of tabs open. Some I like to have open for my "night life." Some I like to have open for research. Some I like to have open for work. Tab groups let me organize these, and hide the ones I'm currently not interested in.
  • Vimperator. This was one thing I missed dearly when I decided to ditch FF for Chrome back in the day. Vimium just doesn't work as well. It's close, but still not as robust. For example, I can still use Vimperator shortcuts when viewing a raw file. Not so with Vimium.
  • Permanent exceptions for bad certificates. Normally this wouldn't be a big deal. You visit a site that has a bad certificate, your browser warns you that bad things could be happening and advises you to be careful. That's great! However, for work we all use several VMs for development (I often have 7 VMs running all the time just for my day-to-day routine). They all share the same certificate, which is bad. Chrome doesn't let me tell it to stop bothering me about the bad certificate. Firefox does.
  • Memory usage. Strangely enough, memory usage isn't something that most people say is awesome in FF these days. But I've personally discovered that it is better than Chrome on "poorly-endowed" systems. Both Chrome and FF are consume gobs of RAM on my regular 8GB+ RAM machines, but I have plenty to spare so it doesn't bother me much. I recently inherited my wife's old MacBook, which only has 1GB of RAM. Chrome is absolutely painful to use--it's slow and brings the rest of the system to a crawl. Firefox is noticeably more responsive on that machine.
  • Selenium. I don't do much UI testing these days, but I do still use Selenium to automate some mundane tasks for work. Haven't checked to see if this is available for Chrome for a while...

So far, the only regret I really have with my decision to move back to Firefox (again, for the time being) is the amount of chrome. Chrome does a good job of staying out of the way and letting me see the web page. Firefox just has just a bit more going on at the top and bottom of the window. Definitely not a deal-breaker on my 1920x1080 laptop though ;)

PIR Motion Sensor + LCD Screen + Arduino Uno

For one reason or another, I've recently had the urge to follow in my father's footsteps and start tinkering with electronics. He's basically a wizard. He has fixed a lot of appliances that others tossed out the door without the slightest bit of investigation. I think he did this with the monitor I had hooked up to my very first computer. He just popped open the case, found the problem, soldered some solution in there, and that monitor worked for probably a decade afterwards. Amazing stuff.

Anyway, I have an itch to create a laser trip wire (don't ask). I took a basic electronics class my freshman year in college, so I am already familiar with resistors, capacitors, ICs, breadboards, soldering, etc. It doesn't seem like a very daunting task to create that trip wire, but I'm starting fresh with electronics (it's been a good 8 or 9 years since I last soldered or anything like that).

One of my co-workers mentioned the Arduino as something to get me started on the trip wire, so I started doing some research here and there. The more I read, the more excited I got. I wanted soo badly to buy all of the junk that I'd need to start tinkering with an Arduino, but I didn't think my wife would appreciate that--especially around Christmas time.

Several of my awesome relatives sent me very generous gift cards for Christmas this year, which finally gave me the opportunity to buy a load of stuff for the Arduino without feeling bad. So I ordered loads of stuff. Most of it is here, some of it is still on its way. My Arduino Uno arrived around noon today, and I haven't been able to stop tinkering! It's really a lot of fun, and stupid easy even if you're not a programmer!

I started with the basic "oooh, blinking LEDs!" sort of projects (or "sketches" in Arduino parlance). Then I moved on to tweak those to work with multiple LEDs. There were a few different scenarios I ran though because two of my LEDs weren't lighting up very well. I guess I either hooked them up wrong or I didn't seat them properly... whatever.

The next project was when I hooked up a PIR (passive infrared) motion sensor ($9.99 at RadioShack) and installed a demo sketch that I found on the Arduino wiki. I wasn't quite sure how to wire everything for the PIR sensor, so I took a look at this Make Magazine video to see one way of setting up the circuit. That one simply lights up an LED and makes some noise. I just made mine light up and LED since I don't have a buzzer (yet).

Next I moved on to the LCD. The package came with a 20x4 green-on-black backlit LCD display, a 2.2k Ohm resistor, and a set of pins to connect the LCD to my breadboard or whatever. I actually soldered the pins onto the LCD display board (wow, was that even more difficult than I remember!) so I would have less of a hassle getting all of the contacts working. Getting the LCD to work was pretty easy after that.

Then I took those two projects a bit further. I'm certain others have done this years ago, but I didn't use a sketch that was completely written for me this time so I felt special enough to share :) I added the PIR circuit from before less the LED, merged pieces from both demo programs, and came up with a motion sensor that would flash "INTRUDER ALERT!!" on the LCD screen a few times when triggered. Based on my testing, the sensor works extremely well (once calibrated!!), and it will detect motion well across the open area of my apartment (maybe a bit further than 20 feet)!

I'm quite happy with my work, especially for not having "dealt with" electronics for such a long time. When I tried to share my success with some of my friends, they wanted a video to prove I'm awesome (I guess?). So here it is. Trust me, I know the video is horrible--I speak too quietly (baby sleeping in next room), I stumble over my words (that's just me), and I neglected to even offer you some music to rock out to as I blabber about this stuff.

Here's the program I wrote/modified:

 /*
  * //////////////////////////////////////////////////
  * //making sense of the Parallax PIR sensor's output
  * //////////////////////////////////////////////////
  *
  * Switches a LED according to the state of the sensors output pin. Determines
  * the beginning and end of continuous motion sequences.
  *
  * @author: Kristian Gohlke / krigoo (_) gmail (_) com / http://krx.at
  * @author: Josh VanderLinden / codekoala (.) gmail (@) com / http://www.codekoala.com
  * @date:   3 Jan 2011
  *
  * kr1 (cleft) 2006
  * Released under a creative commons "Attribution-NonCommercial-ShareAlike 2.0" license
  * http://creativecommons.org/licenses/by-nc-sa/2.0/de/
  *
  *
  * The Parallax PIR Sensor is an easy to use digital infrared motion sensor module.
  * (http://www.parallax.com/detail.asp?product_id=555-28027)
  *
  * The sensor's output pin goes to HIGH if motion is present. However, even if
  * motion is present it goes to LOW from time to time, which might give the
  * impression no motion is present. This program deals with this issue by
  * ignoring LOW-phases shorter than a given time, assuming continuous motion is
  * present during these phases.
  *
  */

 #include <LiquidCrystal.h>

 int calibrationTime = 10;   // seconds to calibrate PIR
 long unsigned int pause = 5000; // timeout before we "all" motion has ceased
 long unsigned int lowIn; // the time when the sensor outputs a low impulse

 boolean lockLow = true;
 boolean takeLowTime;

 int flashCnt = 4;  // number of times the LCD will flash when there's motion
 int flashDelay = 500; // number of ms to wait while flashing LCD
 int pirPin = 7;    // the digital pin connected to the PIR sensor's output
 int lcdPin = 13;   // pin connected to LCD
 LiquidCrystal lcd(12, 11, 10, 5, 4, 3, 2);

 void setup() {
     // Calibrates the PIR

     Serial.begin(9600);
     pinMode(pirPin, INPUT);
     pinMode(lcdPin, OUTPUT);

     digitalWrite(pirPin, LOW);

     clearLcd();

     // give the sensor some time to calibrate
     lcd.setCursor(0,0);
     lcd.print("Calibrating...");

     for(int i = 0; i < calibrationTime; i++){
         lcd.print(".");
         delay(1000);
     }

     lcd.print("done");
     delay(50);
 }

 void clearLcd() {
     // Clears the LCD, turns off the backlight
     lcd.begin(20, 4);
     lcd.clear();
     digitalWrite(lcdPin, LOW);
 }

 void alertLcd() {
     // Turns on the LCD backlight and notifies user of motion
     lcd.setCursor(2, 1);
     digitalWrite(lcdPin, HIGH);
     lcd.print("INTRUDER ALERT!!");
     lcd.setCursor(2, 2);
     lcd.print("================");
 }

 void loop() {
     // Main execution loop

     if(digitalRead(pirPin) == HIGH) {
         // flash an alert a few times
         for (int c = 0; c < flashCnt; c++) {
             alertLcd();
             delay(flashDelay);
             lcd.clear();
             delay(flashDelay);
         }

         if(lockLow) {
             // makes sure we wait for a transition to LOW before any further
             // output is made:
             lockLow = false;
             delay(50);
         }
         takeLowTime = true;
     }

     if (digitalRead(pirPin) == LOW) {
         clearLcd();

         if (takeLowTime) {
             // save the time of the transition from high to LOW
             // make sure this is only done at the start of a LOW phase
             lowIn = millis();
             takeLowTime = false;
         }

         // if the sensor is low for more than the given pause,
         // we assume that no more motion is going to happen
         if (!lockLow && millis() - lowIn > pause) {
             // makes sure this block of code is only executed again after
             // a new motion sequence has been detected
             lockLow = true;
             delay(50);
         }
     }
 }

Quick And Easy Execution Speed Testing

There have been many times when I've been programming, encounter a problem that probably involves a loop of some sort, and I think of two or more possible ways to achieve the same end result. At this point, I usually think about which one will probably be the fastest solution (execution-wise) while still being readable/maintainable. A lot of the time, the essentials of the problem can be tested in a few short lines of code.

A while back, I was perusing some Stack Overflow questions for work, and I stumbled upon what I consider one of the many hidden jewels in Python: the timeit module. Given a bit of code, this little guy will handle executing it in several loops and giving you the best time out of three trials (you can ask it to do more than 3 runs if you want). Once it completes its test, it will offer some very clean and useful output.

For example, today I encountered a piece of code that was making a comma-separated list of an arbitrary number of "%s". The code I saw essentially looked like this:

",".join(["%s"] * 50000)

Even though this code required no optimization, I thought, "Hey, that's neat... I wonder if a list comprehension could possibly be any faster." Here's an example of the contender:

",".join(["%s" for i in xrange(50000)])

I had no idea which would be faster, so timeit to the rescue!! Open up a terminal, type a couple one-line Python commands, and enjoy the results!

$ python -mtimeit 'l = ",".join(["%s"] * 50000)'
1000 loops, best of 3: 1.15 msec per loop
$ python -mtimeit 'l = ",".join(["%s" for i in xrange(50000)])'
100 loops, best of 3: 3.23 msec per loop

Hah, the list comprehension is certainly slower.

Now, for other more in-depth tests of performance, you might consider using the cProfile module. As far as I can tell, simple one-liners can't be tested directly from the command line using cProfile--they apparently need to be in a script. You can use something like:

python -mcProfile script.py

...in such situations. Or you can wrap function calls using cProfile.run():

import cProfile

def function_a():
    # something you want to profile

def function_b():
    # an alternative version of function_a to profile

if __name__ == '__main__':
    cProfile.run('function_a()')
    cProfile.run('function_b()')

I've used this technique for tests that I'd like to have "hard evidence" for in the future. The output of such a cProfile test looks something like this:

3 function calls in 6.860 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    6.860    6.860 <string>:1(<module>)
     1    6.860    6.860    6.860    6.860 test_enumerate.py:5(test_enumerate)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This is useful when your code is calling other functions or methods and you want to find where your bottlenecks are. Hooray for Python!

What profiling techniques do you use?

Selenium Unit Test Reuse

Yesterday, one of the QA guys at work approached me with a question that turned out to be much more interesting to me than I think he had planned. He's been doing some unit testing using Selenium, exporting his test cases to Python. His question was this: how can I run the same unit tests using multiple browsers and multiple target servers?

I'm pretty sure he expected a simple 3-step answer or something like that. Instead, he got my crazy wide-eyed "ohhh... that's something I want to experiment with!" look. I started rambling on about inheritance, dynamic class creation, and nested for loops. His eyes started to look a little worried. He didn't really appreciate the nerdy lingo that much. I told him to pull up a chair and get comfortable.

Since I already had some other work I needed to pay attention to, I didn't want to spend too much time trying to figure out a good way to solve his problem. After about 20 minutes of devilish chuckles and frantic rustling through Python documentation, I came up with the following code:

from types import ClassType
from selenium import selenium
import unittest

IPS = ['192.168.0.1', '192.168.0.2']
BROWSERS = ['safari', 'chrome']

class SomeUnitTest(object):

    def test_something(self):
        sel = self.selenium
        # test code

def main(base):
    suites = []
    results = unittest.TestResult()

    for iidx, ip in enumerate(IPS):
        for bidx, browser in enumerate(BROWSERS):
            def setUp(self):
                self.verificationErrors = []
                self.selenium = selenium("localhost", 4444, "*%s" % self.browser, "http://%s/" % self.ip)
                self.selenium.start()

            def tearDown(self):
                self.selenium.stop()
                self.assertEqual([], self.verificationErrors)

            ut = ClassType('UT_%i_%i' % (iidx, bidx), (unittest.TestCase, base), {'ip': ip, 'browser': browser})
            ut.setUp = setUp
            ut.tearDown = tearDown

            suites.append(unittest.TestLoader().loadTestsFromTestCase(ut))

    unittest.TestSuite(suites)(results)
    for obj, error in results.errors:
        print 'In: ', obj
        print error

if __name__ == "__main__":
    main(SomeUnitTest)

I know, I know... it's got some dirty rotten tricks in it, and there are probably more efficient ways of doing what I've done. If the code offends you, look up at my previous disclaimer: I had other things I needed to be working on, so I didn't spend much time refining this. One thing I'm almost certain could be done better is not monkey patching the dynamic classes with the setUp and tearDown methods. Also, the output at the end of the test execution could definitely use some love. Oh well. Perhaps another day I'll get around to that.

Basically, you just set the servers you need to test and the browsers you want Selenium to run the tests in. Those are at the top of the script: IPS and BROWSERS. Then a new unittest.TestCase class is created for each combination of IP/server+browser. Finally, each of the test cases is thrown into a TestSuite, and the suite is processed. If there were any errors during the tests, they'll be printed out. We weren't really concerned with printing out other information, but you can certainly make other meaningful feedback appear.

Anyway, I thought that someone out there might very well benefit from my little experiment on my co-worker's question. Feel free to comment on your personal adventures with some variation of the code if you find it useful!

Review: Django 1.0 Web Site Development

Introduction

Several months ago, a UK-based book publisher, Packt Publishing contacted me to ask if I would be willing to review one of their books about Django. I gladly jumped at the opportunity, and I received a copy of the book a couple of weeks later in the mail. This happened at the beginning of September 2009. It just so happened that I was in the process of being hired on by ScienceLogic right when all of this took place. The subsequent weeks were filled to the brim with visitors, packing, moving, finding an apartment, and commuting to my new job. It was pretty stressful.

Things are finally settling down, so I've taken the time to actually review the book I was asked to review. I should mention right off the bat that this is indeed a solicited review, but I am in no way influenced to write a good or bad review. Packt Publishing simply wants me to offer an honest review of the book, and that is what I indend to do. While reviewing the book, I decided to follow along and write the code the book introduced. I made sure that I was using the official Django 1.0 release instead of using trunk like I tend to do for my own projects.

The title of the book is Django 1.0 Web Site Development, written by Ayman Hourieh, and it's only 250 pages long. Ayman described the audience of the book as such:

This book is for web developers who want to learn how to build a complete site with Web 2.0 features, using the power of a proven and popular development system--Django--but do not necessarily want to learn how a complete framework functions in order to do this. Basic knowledge of Python development is required for this book, but no knowledge of Django is expected.

Ayman introduced Django piece by piece using the end goal of a social bookmarking site, a la del.icio.us and reddit. In the first chapter of the book, Ayman discussed the history of Django and why Python and Django are a good platform upon which to build Web applications. The second chapter offers a brief guide to installing Python and Django, and getting your first project setup. Not much to comment on here.

Digging In

Chapter three is where the reader was introduced to the basic structure of a Django project, and the initial data models were described. Chapter four discussed user registration and management. We made it possible for users to create accounts, log into them, and log out again. As part of those additions, the django.forms framework was introduced.

In chapter five, we made it possible for bookmarks to be tagged. Along with that, we built a tag cloud, restricted access to certain pages, and added a little protection against malicious data input. Next up was the section where things actually started getting interesting for me: enhancing the interface with fancy effects and AJAX. The fancy effects include live searching for bookmarks, being able to edit a bookmark in place (without loading a new page), and auto-completing tags when you submit a bookmark.

This chapter really reminded me just how simple it is to add new, useful features to existing code using Django and Python. I was thoroughly impressed at how easy it was to add the AJAX functionality mentioned above. Auto-completing the tags as you type, while jQuery and friends did most of the work, was very easy to implement. It made me happy.

Chapter seven introduced some code that allowed users to share their bookmarks with others. Along with this, the ability to vote on shared bookmarks was added. Another feature that was added in this chapter was the ability for users to comment on various bookmarks.

The ridiculously amazing Django Administration utility was first introduced in chapter eight. It kinda surprised me that it took 150 pages before this feature was brought to the user's attention. In my opinion, this is one of the most useful selling points when one is considering a Web framework for a project. When I first encountered Django, the admin interface was one of maybe three deciding factors in our company's decision to become a full-on Django shop.

Bring on the Web 2.0

Anyway, in chapter nine, we added a handful of useful "Web 2.0" features. RSS feeds were introduced. We learned about pagination to enhance usability and performance. We also improved the search engine in our project. At this stage, the magical Q objects were mentioned. The power behind the Q objects was discussed very well, in my opinion.

In chapter 10, we were taught how we can create relationships between members on the site. We made it possible for users to become "friends" so they can see the latest bookmarks posted by their friends. We also added an option for users to be able to invite some of their other friends to join the site via email, complete with activation links. Finally, we improved the user interface by providing a little bit of feedback to the user at various points using the messages framework that is part of the django.contrib.auth package in Django 1.0.

More advanced topics, such as internationalization and caching, were discussed in chapter 11. Django's special unit testing features were also introduced in chapter 11. This section actually kinda frustrated me. Caching was discussed immediately before unit testing. In the caching section, we learned how to enable site-wide caching. This actually broke the unit tests. They failed because the caching system was "read only" while running the tests. Anyway, it's probably more or less a moot point.

Chapter 11 also briefly introduced things to pay attention to when you deploy your Django projects into a production environment. This portion was mildly disappointing, but I don't know what else would have made it better. There are so many functional ways to deploy Django projects that you could write books just to describe the minutia involved in deployment.

The twelfth and final chapter discussed some of the other things that Django has to offer, such as enhanced functionality in templates using custom template tags and filters and model managers. Generic views were mentioned, and some of the other useful things in django.contrib were brought up. Ayman also offered a few ideas of additional functionality that the reader can implement on their own, using the things they learned throughout the book.

Afterthoughts

Overall, I felt that this book did a great job of introducing the power that lies in using Django as your framework of choice. I thought Ayman managed to break things up into logical sections, and that the iterations used to enhance existing functionality (from earlier chapters) were superbly executed. I think that this book, while it does assume some prior Python knowledge, would be a fine choice for those who are curious to dig into Django quickly and easily.

Some of the beefs I have with this book deal mostly with the editing. There were a lot of strange things that I found while reading through the book. However, the biggest sticking point for me has to do with "pluggable" applications. Earlier I mentioned that the built-in Django admin was one of only a few deciding factors in my company's choice to become a Django shop. Django was designed to allow its applications to be very "pluggable."

You may be asking, "What do I mean by 'pluggable'?" Well, say you decide to build a website that includes a blog, so you build a Django project and create an application specific to blogging. Then, at some later time, you need to build another site that also has blog functionality. Do you want to rewrite all of the blogging code for the second site? Or do you want to use the same code that you used in the first site (without copying it)? If you're anything like me and thousands of other developers out there, you would probably rather leverage the work you had already done. Django allows you to do this if you build your Django applications properly.

This book, however, makes no such effort to teach the reader how to turn all of their hard work on the social bookmarking features into something they could reuse over and over with minimal effort in the future. Application-specific templates are placed directly into the global templates directory. Application-specific URLconfs are placed in the root urls.py file. I would have liked to see at least some effort to make the bookmarking application have the potential to be reused.

Finally, the most obvious gripe is that the book is outdated. That's understandable, though! Anything in print media will likely be outdated the second it is printed if the book has anything to do with computers. However, with the understanding that this book was written specifically for Django 1.0 and not Django 1.1 or 1.2 alpha, it does an excellent job at hitting the mark.

Automatic Config Replication With Mercurial

I've done a lot of neat things since I started my new job earlier this month. I'm really excited about the things I've learned and experimented with, and I would like to share some of the concepts with my visitors.

At work we use a lot of virtual machines in our individual development environments. Most of these virtual machines use very similar configuration settings, but the settings are not a standard part of the installation. That is because we build our virtual machines using the same installation tools that our customers would use. The configuration I'm talking about is just stuff specific to our development environment.

Creating and configuring these virtual machines is one of the first things my mentor showed me how to do my first day on the job. He commented on how quickly I would probably start learning all of the configuration tasks because we tend to setup our development VMs several times a month. That was all fine and dandy, and I did get a pretty good feel for what needed to go into a development VM that first day.

However, after doing it so many times, I realized how much time I was using just trying to get the VM set up just right. It wasn't hard to configure--it was just time-consuming. It wasn't long before I started thinking of ways to optimize the process.

One of the ideas I came up with, which seems to be serving my purposes perfectly, is that of using Mercurial to quickly and easily get the exact same configuration from one box to another. It also has the added benefit of keeping a history of the changes I make to my configuration as time goes on.

I won't go into exact detail on how I have things setup at work, but I would like to try to describe a similar scenario that should illustrate my goal just as well.

Getting Started

One of the first things I would encourage you to do is follow along. It will make the concept sink in much faster, and you will probably see other applications very quickly. Please note, however, that if you're following along exactly, it could be a very time-consuming process. I will be using 3 virtual machines as I write this, but you could just as easily use 5, 10, or 100,000. Likewise, you could eliminate the virtual machines altogether if you're in an environment with several physical computers.

One virtual machine will act as the "master" server, or the one that will be configured first. The other virtual machines will act as "slave" servers, which will simply receive configuration updates that happen on the master server. We will also modify this behavior to be a bit more interesting toward the end of the article.

Virtual Machines Galore!

First off, I will create some basic virtual machines using the net install version of Debian 5.0.3. I really only need to create 1 VM and then clone it a couple of times. I am willing to furnish my virtual machines to those who are interested in using them. I will install some additional software in the VM to make sure the demo works smoothly. Among the packages that I will install are:

  • Python
  • Mercurial
  • OpenSSH server

Initialize a Repository

Once I have all of that set up in my virtual machines, I will initialize a Mercurial repository on the master server to maintain the configuration files that I am interested in. Let's just use the /etc directory for the time being. There's a pretty good chance that most of our system-wide configuration will all be contained somewhere beneath /etc.

cd /etc
hg init

Now let's have a gander at the files that we can have Mercurial manage for us:

hg st

Wow! That is quite a set of files, isn't it? Thankfully, they should mostly be plain text files. Mercurial is very efficient at managing text files. Let's now add all of the files in /etc to our repository, so they can be tracked and easily pushed out to other systems.

hg add

That command will happily add everything that hg st printed. Obviously, we can get a little more picky about what we do and do not add to our repository, but that's not the goal of this article. Now, this step merely tells Mercurial that it needs to pay attention to changes in these files. The files have not yet been committed to the repo. Let's do that, so we have a backup of our configuration files in their pristine state:

hg ci -m "Initial import"

The -m "Initial import" is just a comment, to describe what happened to warrant a commit to the repository. It is for your use and the use of anyone who has access to your repo.

Clone The Configuration

Now let's try to push the configuration we just committed on the master server to one of the slave servers. Since my virtual machines are all essentially in the same state, there should be no conflicts, right? Try running the following command on the master server:

hg push ssh://root@slave1//etc
root@slave1's password:
remote: abort: There is no Mercurial repository here (.hg not found)!
abort: no suitable response from remote hg!

Blast! We can't simply push the configuration files out to another computer. For that to work, we'd first have to have the repository itself exist on the slave server. Let's try this another way. One the slave server, run this command:

hg clone ssh://root@master//etc /etc
root@master's password:
abort: destination '/etc/' is not empty

Doh! Mercurial won't let us clone the repository from the master server! That's because Mercurial wants to clone to a new directory, with nothing already in it. One way to get around this hairball of a show-stopper is to just copy the repo using conventional UNIX utilities. Execute this command on one of your slave servers:

scp -r root@master:/etc/.hg /etc/

The .hg directory contains all of the repository information, and it's really all we need to snag in order to clone the repository. This might not be the most elegant solution in the world, but it will suffice for the time being. Once the scp command completes, we should have a full copy of the configuration file repository. Run this command to verify:

hg st

If your setup is anything like mine, you'll probably have a few files that are listed as being modified. Chances are that these files will vary from host to host anyway, and they are probably not worth keeping in a version control system. That would just be begging for conflicts.

I wrote an extension for Mercurial that should make this part of my tutorial a little less hacky. On your other slave server, run the following commands:

hg clone http://bitbucket.org/codekoala/hgext /root/hgext
echo "[extensions]" >> /root/.hgrc
echo "neclone = /root/hgext/neclone.py" >> /root/.hgrc

This extension gives you a new Mercurial command called neclone (N. E. Clone, or "not empty clone"). As we saw earlier, Mercurial doesn't let us clone a repository into a directory that is not empty. This extension allows us to do that. It works almost identically to the regular clone command... takes the same options and everything.

Still on your second slave server, run these additional commands:

hg neclone ssh://root@master//etc /etc
cd /etc
hg up -C

The last step is optional, and soon to be included as part of the extension. It will update your working copy to the latest revision in the repository. Beware that it overwrites any uncommitted changes you may have made to files that are tracked by Mercurial.

So now both slave servers should have a clone of the configuration repository from the master server.

Being Picky

Let's start to be a little picky about the files we are tracking in our repository. Some of the files appears as being modified on my slave server after copying the .hg directory from the master server are:

  • adjtime
  • alternatives/pager
  • alternatives/pager.1.gz
  • mailcap
  • network/run/ifstate
  • udev/rules.d/70-persistent-net.rules

I think it's safe to remove these from the repository, to avoid conflicts with other systems. To tell Mercurial to stop tracking files it is tracking, without actually deleting the file from the filesystem, you can use the following command:

hg forget adjtime
hg forget mailcap

And so on. Go ahead and do that for each of the files that appeared to be modified on your slave server immediately after copying the .hg directory. I'm going to add /etc/hostname to the list of files to forget too.

After doing that, each of those files should appear as being marked for removal when you run hg st. Don't worry, this is normal. The files will not be deleted from the filesystem, but they will be deleted from the repository. Go ahead and commit those changes to the repository on your slave server.

hg ci -Am "Removed some files from version control"

Now let's push those changes out to the master server:

hg push
abort: repository default-push not found!

Since we copied the .hg directory directly using scp, our slave won't know where the changes need to go when we run the push command with no explicit destination repository. To fix that, let's create a file in /etc/.hg/ called hgrc on the slave server. In that file, put the following text:

[paths]
default = ssh://root@master//etc

The hg push command should now push directly to the master server. Yay! The problem we face now is that every other slave server in the group is out of date. How can we fix that? We'll use Mercurial hooks.

Automating Config Replication

Mercurial offers some very useful hooks that we can use to automatically push configuration changes out to each of our slave servers. We will use the commit and changegroup hooks to do the magic. Let's create a script that will live on the master server to take care of pushing our changes out to each slave server. Create a new file in /etc/ on the master server called propagate.sh:

#!/bin/bash
hg up
for node in 'slave1' 'slave2'
do
    ssh root@$node "cd /etc; hg pull -u"
done

Let's also make sure this script is executable:

chmod +x /etc/propagate.sh

This script assumes that your /etc/hosts file or your nameserver are configured appropriately to allow slave1 and slave2 to be resolved to IP addresses. The reason we're SSH'ing into each slave server and using hg pull instead of simply using hg push ssh://root@$node//etc is because you can't force an update on a remote server using push. You can, however, request an update when you're using pull.

Obviously, this script is not the most sophisticated of scripts. It might work well for my demonstration, with only a few servers, but once you get beyond that it would be a nightmare to maintain the list of servers the script has to connect to. You can use whatever means you'd like to keep track of the servers you want to replicate your configuration to. I don't want to bother with all of the crap I'd get for suggesting one thing over another, so it's now your call.

Now it's time to configure the Mercurial hook to execute that script when the master server sees a changeset get into its repository. Open up /etc/.hg/hgrc on the master server, or create it if it doesn't exist. Make sure it has at least the following in it:

[hooks]
commit.propagate = /etc/propagate.sh
changegroup.propagate = /etc/propagate.sh

Let's try it out! Run these commands on your master server:

echo "" >> /etc/hosts
hg ci -m "Added a blank line to the hosts file"
root@slave1's password:
remote: Permission denied, please try again.
remote: Permission denied, please try again.
remote: Permission denied (publickey,password).
abort: no suitable response from remote hg!
Connection closed by slave2
warning: commit.propagate hook exited with status 255

Blast! The script failed because it wanted us to type in a password, but it was not in interactive mode. Let's fix that with a little preshared key magic. I won't go into the details about how this works, but the following commands on your master server should get us rolling:

ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
scp -r ~/.ssh root@slave1:~
scp -r ~/.ssh root@slave2:~

Warning

Keep in mind this is not secure and should probably not be how your production machines are configured, especially with the root user.

For simplicity's sake, just accept all of the details and don't set a passphrase. These commands enable us to SSH into our slave servers without using a password. If you get an error such as:

remote: Host key verification failed.
abort: no suitable response from remote hg!

...it just means you need to manually log into your master server from the slave machine that threw that error. When doing so, you will have to answer "yes" to a question about the authenticity of the host you're logging into.

Testing It Out

It is now time to see if we can make a configuration change on one slave server and have it show up on the other slave server. Let's update the hosts file a little bit. Let's add the following line on the second slave server:

10.0.0.5        nonexistanthost

Now let's commit the change and push it off to the master server:

hg ci -m "Added a dumb line to the hosts file"
hg push

My system actually told me that that it had copied the change out to another host. I know because I saw these lines:

remote: pulling from ssh://root@master//etc
remote: searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Now when I look at the first slave server, I should see that new line in my /etc/hosts file. Also, the log on each server should have the same entry that I just made about adding "a dumb line to the hosts file."

Seem Like A Lot of Work?

A lot of what we just did probably seemed like more work that it is worth, right? Well, being a nerd typically comes with a few qualities. One quality which I have observed many a time in my most geeky of friends is that they will spend hours and hours up front on a program or script just so they can save 2 minutes in the future. They work hard to be lazy.

There is a lot of boilerplate configuration that takes place in this particular scenario. I realize that. What I haven't shared with you, though, is how I automated the boilerplate configuration as well as the propagation of configuration. I'm tired of putting this article off, so I will have to leave those details for another article. Sorry!

Why?! There's a Better Way (tm)

There is always a better way. Always. Go ahead and use whatever you feel is the most efficient method for keeping configuration files in sync across several computers. This is just one more option to add to your toolkit. Don't worry, I won't be offended if you don't like it or don't use it. It works perfect for me and it's free, and I just wanted to share!

Django 1.1 Has Arrived!

Just a quick note to spread the news as far as possible: Django 1.1 has been released!

This looks to be an excellent release, as usual. Some new features:

Hooray! Excellent work everyone! For more info, see the Release Notes.

Adding Captcha To Django's Built-in Comments

I recently had a good friend of mine ask for some help in adding a captcha field to his comments form. I gave him some pointers, but before he could put them into action he had to leave for a Thanksgiving roadtrip home. I didn't give much mind to the idea of putting captchas on my own site since it's not all that popular amongst spammers yet. When I woke up this morning, however, I found myself with a few spare minutes to see if my pointers were correct.

Some of the ideas I shared with my friend turned out to not work very well. As I tinkered about trying to get things to work on my own site, I think I came up with a relatively efficient way of doing things.

Installing The Captcha

The captcha field I use is quite simple and effective. I originally got it from http://django.agami.at/media/captcha/, but the project seems to be unmaintained now. Along the road to Django 1.0, some changes were made to the way form fields work, and there is a minor change required in the base code for this captcha field if you want it to work. Alternatively, you can use a copy of the field that I'm currently using.

All you need to do is extract the captcha directory somewhere on your PYTHONPATH. The author recommends putting it in django.contrib, but I usually just place it straight on the PYTHONPATH so all I need to do is from captcha import CaptchaField instead of from django.contrib.captcha import CaptchaField. Minor details...

Adding The Captcha

The first thing you'll want to do after installing the captcha field is add the field itself to your comments form. Instead of subclassing the built-in django.contrib.comments.forms.CommentForm form, I simply decorated the constructor of the form as such:

1
2
3
4
5
6
7
8
9
from django.contrib.comments.forms import CommentForm
from captcha import CaptchaField

def add_captcha(func):
    def wrapped(self, *args, **kwargs):
        func(self, *args, **kwargs)
        self.fields['security_code'] = CaptchaField()
    return wrapped
CommentForm.__init__ = add_captcha(CommentForm.__init__)

This adds a field called security_code to the CommentForm, and it works the same way as if you had done something like this:

1
2
3
4
5
6
7
from django import forms
from captcha import CaptchaField

class MyCommentForm(forms.Form):
    name = forms.CharField()
    ...
    security_code = CaptchaField()

You can put the decorating snippet from above anywhere you'd like so long as the module you put it in is loaded at some point in your project. I usually put this sort of magic in my main urls.py file so it's harder to forget about when I debug things.

Fixing the Form

The first problem with this little trick seems to be that the CaptchaField is rendered as unsafe HTML in the default form.html template in the built-in comments application. That just means that, instead of seeing the captcha, you will see the HTML necessary to render the CaptchaField directly on the page, like this:

<input type="hidden" name="security_code" value="captcha.caZ1SqQ" />
<img src="/static/captchas/caZ1SqQ/0656f09d3974850397dd4c4974f23a35.gif"
 alt="" /><br /><input type="text" name="security_code"
 id="id_security_code" />

To fix that, you can apply the safe filter to the field and make the template look something like this:

{% load comments i18n %}
<form action="{% comment_form_target %}" method="post">
<table>
{% for field in form %}
    {% if field.is_hidden %}
    {{ field }}
    {% else %}
    <tr{% ifequal field.name "honeypot" %} style="display:none;"{% endifequal %}>
        <th>{{ field.label_tag }}:</th>
        <td {% if field.errors %} class="error"{% endif %}>
            {{ field|safe }} &nbsp;
            {% if field.errors %}{{ field.errors|join:"; " }}{% endif %}
        </td>
    </tr>
    {% endif %}
{% endfor %}
</table>
<p class="submit">
    <input type="submit" name="post" class="submit-post"
     value="{% trans "Post" %}" />
    <input type="submit" name="preview" class="submit-preview"
     value="{% trans "Preview" %}" />

</form>

Notice the {{ field|safe }} in there. Also note that I prefer the table layout for the comment form over the default mode. If you change your form template as I have done, you should put the updated copy in your own project's template directory. It belongs in templates/comments/form.html, assuming that your templates directory is called templates. You'll probably also want to check out the preview.html template for the django.contrib.comments application. I changed mine to look like this:

{% extends "comments/base.html" %}
{% load i18n %}

{% block title %}{% trans "Preview your comment" %}{% endblock %}

{% block content %}
    {% load comments %}
    {% if form.errors %}
    <h1>{% blocktrans count form.errors|length as counter %}Please
     correct the error below{% plural %}Please correct the errors below
     {% endblocktrans %}</h1>
    {% else %}
    <h1>{% trans "Preview your comment" %}</h1>
        <blockquote>{{ comment|linebreaks }}</blockquote>

        {% trans "and" %} <input type="submit" name="submit"
         class="submit-post" value="{% trans "Post your comment" %}"
         id="submit" /> {% trans "or make changes" %}:

    {% endif %}

    {% include 'comments/form.html' %}
{% endblock %}

See how I just use the include tag to pull in the comments/form.html template I mentioned above? Saves a lot of typing and potential for problems... If you update the preview.html template, you should save your copy in templates/comments/preview.html, assuming your templates directory is called templates.

Testing It Out

At this point, you should be able to try out your newly installed captcha-fied comments. If it doesn't work, please comment on this article and perhaps we can figure out the problem!

Project Release: django-axes 0.1-pre

Ever curious to see information about failed login attempts on your site? This morning I threw together a small, simple Django application that allows you to do just that. I'm calling this project django-axes, which is pronounced as "jango access".

I've only tested it on my own site, so it must still undergo some more in-depth testing before I can call it a stable application. From my testing, however, it works exactly as I expect it to.

For more information, check out the project homepage on Google Code.