Auto-Generating Documentation Using Mercurial, ReST, and Sphinx

I often find myself taking notes about various aspects of my job that I feel I would forget as soon as I moved onto another project. I've gotten into the habit of taking my notes using reStructured Text, which shouldn't come as any surprise to any of my regular visitors. On several occasions, I had some of the other guys in the company ask me for some clarification on some things I had taken notes on. Lucky for me, I had taken some nice notes!

However, these individuals probably wouldn't appreciate reading ReST markup as much as I do, so I decided to do something nice for them. I setup Sphinx to prettify my documentation. I then wrote a small Web server using Python, so people within the company network could access the latest version of my notes without much hassle.

Just like I take notes to remind myself of stuff at work, I want to do that again for this automated ReST->HTML magic--I want to be able to do this in the future! I figured I would make my notes even more public this time, so you all can enjoy similar bliss.

Platform Dependence

I am writing this article with UNIX-like operating systems in mind. Please forgive me if you're a Windows user and some of this is not consistent with what you're seeing. Perhaps one day I'll try to set this sort of thing up on Windows.

Installing Sphinx

The first step that we want to take is installing Sphinx. This is the project that Python itself uses to generate its online documentation. It's pretty dang awesome. Feel free to skip this section if you have already installed Sphinx.

Depending on your environment of choice, you may or may not have a package manager that offers python-sphinx or something along those lines. I personally prefer to install it using pip or easy_install:

$ sudo pip install sphinx

Running that command will likely respond with a bunch of output about downloading Sphinx and various dependencies. When I ran it in my sandbox VM, I saw it install the following packages:

  • pygments
  • jinja2
  • docutils
  • sphinx

It should be a pretty speedy installation.

Installing Mercurial

We'll be using Mercurial to keep track of changes to our ReST documentation. Mercurial is a distributed version control system that is built using Python. It's wonderful! Just like with Sphinx, if you have already installed Mercurial, feel free to skip to the next section.

I personally prefer to install Mercurial using pip or easy_install--it's usually more up-to-date than what you would have in your package repositories. To do that, simply run a command such as the following:

$ sudo pip install mercurial

This will go out and download and install the latest stable Mercurial. You may need python-dev or something like that for your platform in order for that command to work. However, if you're on Windows, I highly recommend TortoiseHg. The installer for TortoiseHg will install a graphical Mercurial client along with the command line tools.

Create A Repository

Now let's create a brand new Mercurial repository to house our notes/documentation. Open a terminal/console/command prompt to the location of your choice on your computer and execute the following commands:

$ hg init mydox
$ cd mydox

Configure Sphinx

The next step is to configure Sphinx for our project. Sphinx makes this very simple:

$ sphinx-quickstart

This is a wizard that will walk you through the configuration process for your project. It's pretty safe to accept the defaults, in my opinion. Here's the output of my wizard:

$ sphinx-quickstart
Welcome to the Sphinx quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Enter the root path for documentation.
> Root path for the documentation [.]:

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/N) [n]: y

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:

The project name will occur in several places in the built documentation.
> Project name: My Dox
> Author name(s): Josh VanderLinden

Sphinx has the notion of a "version" and a "release" for the
software. Each version can have multiple releases. For example, for
Python the version is something like 2.5 or 3.0, while the release is
something like 2.5.1 or 3.0a1.  If you don't need this dual structure,
just set both to the same value.
> Project version: 0.0.1
> Project release [0.0.1]:

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:

Please indicate if you want to use one of the following Sphinx extensions:
> autodoc: automatically insert docstrings from modules (y/N) [n]:
> doctest: automatically test code snippets in doctest blocks (y/N) [n]:
> intersphinx: link between Sphinx documentation of different projects (y/N) [n]:
> todo: write "todo" entries that can be shown or hidden on build (y/N) [n]:
> coverage: checks for documentation coverage (y/N) [n]:
> pngmath: include math, rendered as PNG images (y/N) [n]:
> jsmath: include math, rendered in the browser by JSMath (y/N) [n]:
> ifconfig: conditional inclusion of content based on config values (y/N) [n]:

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (Y/n) [y]:
> Create Windows command file? (Y/n) [y]: n

Finished: An initial directory structure has been created.

You should now populate your master file ./source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

If you followed the same steps I did (I separated the source and build directories), you should see three new files in your mydox repository:

  • build/
  • Makefile
  • source/

We'll do our work in the source directory.

Get Some ReST

Now is the time when we start writing some ReST that we want to turn into HTML using Sphinx. Open some file, like first_doc.rst and put some ReST in it. If nothing comes to mind, or you're not familiar with ReST syntax, try the following:

=========================
This Is My First Document
=========================

Yes, this is my first document.  It's lame.  Deal with it.

Save the file (keep in mind that it should be within the source directory if you used the same settings I did). Now it's time to add it to the list of files that Mercurial will pay attention to. While we're at it, let's add the other files that were created by the Sphinx configuration wizard:

$ hg add
adding ../Makefile
adding conf.py
adding first_doc.rst
adding index.rst
$ hg st
A Makefile
A source/conf.py
A source/first_doc.py
A source/index.rst

Don't worry that we don't see all of the directories in the output of hg st--Mercurial tracks files, not directories.

Automate HTML-ization

Here comes the magic in automating the conversion from ReST to HTML: Mercurial hooks. We will use the precommit hook to fire off a command that tells Sphinx to translate our ReST markup into HTML.

Edit your mydox/.hg/hgrc file. If the file does not yet exist, go ahead and create it. Add the following content to it:

[hooks]
precommit.sphinxify = ~/bin/sphinxify_docs.sh

I've opted to call a Bash script instead of using an inline Python call. Now let's create the Bash script, ~/bin/sphinxify_docs.sh:

#!/bin/bash
cd $HOME/mydox
sphinx-build source/ docs/

Notice that I used the $HOME environment variable. This means that I created the mydox directory at /home/myusername/mydox. Adjust that line according to your setup. You'll probably also want to make that script executable:

$ chmod +x ~/bin/sphinxify_docs.sh

Three, Two, One...

You should now be at a stage where you can safely commit changes to your repository and have Sphinx build your HTML documentation. Execute the following command somewhere under your mydox repository:

$ hg ci -m "Initial commit"

If your setup is anything like mine, you should see some output similar to this:

$ hg ci -m "Initial commit"
Making output directory...
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... not found
building [html]: targets for 2 source files that are out of date
updating environment: 2 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... /home/jvanderlinden/mydox/source/first_doc.rst:: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded, 1 warning.
$ hg st
? docs/.buildinfo
? docs/.doctrees/environment.pickle
? docs/.doctrees/first_doc.doctree
? docs/.doctrees/index.doctree
? docs/_sources/first_doc.txt
? docs/_sources/index.txt
? docs/_static/basic.css
? docs/_static/default.css
? docs/_static/doctools.js
? docs/_static/file.png
? docs/_static/jquery.js
? docs/_static/minus.png
? docs/_static/plus.png
? docs/_static/pygments.css
? docs/_static/searchtools.js
? docs/first_doc.html
? docs/genindex.html
? docs/index.html
? docs/objects.inv
? docs/search.html
? docs/searchindex.js

If you see something like that, you're in good shape. Go ahead and take a look at your new mydox/docs/index.html file in the Web browser of your choosing.

Not very exciting, is it? Notice how your first_doc.rst doesn't appear anywhere on that page? That's because we didn't tell Sphinx to put it there. Let's do that now.

Customizing Things

Edit the mydox/source/index.rst file that was created during Sphinx configuration. In the section that starts with .. toctree::, let's tell Sphinx to include everything we ReST-ify:

.. toctree::
   :maxdepth: 2
   :glob:

   *

That should do it. Now, I don't know about you, but I don't really want to include the output HTML, images, CSS, JS, or anything in my documentation repository. It would just take up more space each time we change an .rst file. Let's tell Mercurial to not pay attention to the output HTML--it'll just be static and always up-to-date on our filesystem.

Create a new file called mydox/.hgignore. In this file, put the following content:

syntax: glob
docs/

Save the file, and you should now see something like the following when running hg st:

$ hg st
M source/index.rst
? .hgignore

Let's include the .hgignore file in the list of files that Mercurial will track:

$ hg add .hgignore
$ hg st
M source/index.rst
A .hgignore

Finally, let's commit one more time:

$ hg ci -m "Updating the index to include our .rst files"
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... done
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded.

Tada!! The first_doc.rst should now appear on the index page.

Serving Your Documentation

Who seriously wants to have HTML files that are hard to get to? How can we make it easier to access those HTML files? Perhaps we can create a simple static file Web server? That might sound difficult, but it's really not--not when you have access to Python!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

def main():
    try:
        server = HTTPServer(('', 80), SimpleHTTPRequestHandler)
        server.serve_forever()
    except KeyboardInterrupt:
        server.socket.close()

if __name__ == '__main__':
    main()

I created this simple script and put it in my ~/bin/ directory, also making it executable. Once that's done, you can navigate to your mydox/docs/ directory and run the script. Since I called the script webserver.py, I just do this:

$ cd ~/mydox/docs
$ sudo webserver.py

This makes it possible for you to visit http://localhost/ on your own computer, or to use your computer's IP in place of localhost to access your documentation from a different computer on your network. Pretty slick, if you ask me.

I suppose there's more I could add, but that's all I have time for tonight. Enjoy!

Review: Django 1.0 Web Site Development

Introduction

Several months ago, a UK-based book publisher, Packt Publishing contacted me to ask if I would be willing to review one of their books about Django. I gladly jumped at the opportunity, and I received a copy of the book a couple of weeks later in the mail. This happened at the beginning of September 2009. It just so happened that I was in the process of being hired on by ScienceLogic right when all of this took place. The subsequent weeks were filled to the brim with visitors, packing, moving, finding an apartment, and commuting to my new job. It was pretty stressful.

Things are finally settling down, so I've taken the time to actually review the book I was asked to review. I should mention right off the bat that this is indeed a solicited review, but I am in no way influenced to write a good or bad review. Packt Publishing simply wants me to offer an honest review of the book, and that is what I indend to do. While reviewing the book, I decided to follow along and write the code the book introduced. I made sure that I was using the official Django 1.0 release instead of using trunk like I tend to do for my own projects.

The title of the book is Django 1.0 Web Site Development, written by Ayman Hourieh, and it's only 250 pages long. Ayman described the audience of the book as such:

This book is for web developers who want to learn how to build a complete site with Web 2.0 features, using the power of a proven and popular development system--Django--but do not necessarily want to learn how a complete framework functions in order to do this. Basic knowledge of Python development is required for this book, but no knowledge of Django is expected.

Ayman introduced Django piece by piece using the end goal of a social bookmarking site, a la del.icio.us and reddit. In the first chapter of the book, Ayman discussed the history of Django and why Python and Django are a good platform upon which to build Web applications. The second chapter offers a brief guide to installing Python and Django, and getting your first project setup. Not much to comment on here.

Digging In

Chapter three is where the reader was introduced to the basic structure of a Django project, and the initial data models were described. Chapter four discussed user registration and management. We made it possible for users to create accounts, log into them, and log out again. As part of those additions, the django.forms framework was introduced.

In chapter five, we made it possible for bookmarks to be tagged. Along with that, we built a tag cloud, restricted access to certain pages, and added a little protection against malicious data input. Next up was the section where things actually started getting interesting for me: enhancing the interface with fancy effects and AJAX. The fancy effects include live searching for bookmarks, being able to edit a bookmark in place (without loading a new page), and auto-completing tags when you submit a bookmark.

This chapter really reminded me just how simple it is to add new, useful features to existing code using Django and Python. I was thoroughly impressed at how easy it was to add the AJAX functionality mentioned above. Auto-completing the tags as you type, while jQuery and friends did most of the work, was very easy to implement. It made me happy.

Chapter seven introduced some code that allowed users to share their bookmarks with others. Along with this, the ability to vote on shared bookmarks was added. Another feature that was added in this chapter was the ability for users to comment on various bookmarks.

The ridiculously amazing Django Administration utility was first introduced in chapter eight. It kinda surprised me that it took 150 pages before this feature was brought to the user's attention. In my opinion, this is one of the most useful selling points when one is considering a Web framework for a project. When I first encountered Django, the admin interface was one of maybe three deciding factors in our company's decision to become a full-on Django shop.

Bring on the Web 2.0

Anyway, in chapter nine, we added a handful of useful "Web 2.0" features. RSS feeds were introduced. We learned about pagination to enhance usability and performance. We also improved the search engine in our project. At this stage, the magical Q objects were mentioned. The power behind the Q objects was discussed very well, in my opinion.

In chapter 10, we were taught how we can create relationships between members on the site. We made it possible for users to become "friends" so they can see the latest bookmarks posted by their friends. We also added an option for users to be able to invite some of their other friends to join the site via email, complete with activation links. Finally, we improved the user interface by providing a little bit of feedback to the user at various points using the messages framework that is part of the django.contrib.auth package in Django 1.0.

More advanced topics, such as internationalization and caching, were discussed in chapter 11. Django's special unit testing features were also introduced in chapter 11. This section actually kinda frustrated me. Caching was discussed immediately before unit testing. In the caching section, we learned how to enable site-wide caching. This actually broke the unit tests. They failed because the caching system was "read only" while running the tests. Anyway, it's probably more or less a moot point.

Chapter 11 also briefly introduced things to pay attention to when you deploy your Django projects into a production environment. This portion was mildly disappointing, but I don't know what else would have made it better. There are so many functional ways to deploy Django projects that you could write books just to describe the minutia involved in deployment.

The twelfth and final chapter discussed some of the other things that Django has to offer, such as enhanced functionality in templates using custom template tags and filters and model managers. Generic views were mentioned, and some of the other useful things in django.contrib were brought up. Ayman also offered a few ideas of additional functionality that the reader can implement on their own, using the things they learned throughout the book.

Afterthoughts

Overall, I felt that this book did a great job of introducing the power that lies in using Django as your framework of choice. I thought Ayman managed to break things up into logical sections, and that the iterations used to enhance existing functionality (from earlier chapters) were superbly executed. I think that this book, while it does assume some prior Python knowledge, would be a fine choice for those who are curious to dig into Django quickly and easily.

Some of the beefs I have with this book deal mostly with the editing. There were a lot of strange things that I found while reading through the book. However, the biggest sticking point for me has to do with "pluggable" applications. Earlier I mentioned that the built-in Django admin was one of only a few deciding factors in my company's choice to become a Django shop. Django was designed to allow its applications to be very "pluggable."

You may be asking, "What do I mean by 'pluggable'?" Well, say you decide to build a website that includes a blog, so you build a Django project and create an application specific to blogging. Then, at some later time, you need to build another site that also has blog functionality. Do you want to rewrite all of the blogging code for the second site? Or do you want to use the same code that you used in the first site (without copying it)? If you're anything like me and thousands of other developers out there, you would probably rather leverage the work you had already done. Django allows you to do this if you build your Django applications properly.

This book, however, makes no such effort to teach the reader how to turn all of their hard work on the social bookmarking features into something they could reuse over and over with minimal effort in the future. Application-specific templates are placed directly into the global templates directory. Application-specific URLconfs are placed in the root urls.py file. I would have liked to see at least some effort to make the bookmarking application have the potential to be reused.

Finally, the most obvious gripe is that the book is outdated. That's understandable, though! Anything in print media will likely be outdated the second it is printed if the book has anything to do with computers. However, with the understanding that this book was written specifically for Django 1.0 and not Django 1.1 or 1.2 alpha, it does an excellent job at hitting the mark.

Tip: easy_install / pip

With all of the exciting updates to Mercurial recently, I've been on a rampage, updating various boxes everywhere I go. I'm in the habit of using easy_install and/or pip to install most of my Python-related packages. It's pretty easy to install packages that are in well-known locations (like PyPI or on Google Code, for example). It's also pretty easy to update packages using either utility. Both take a -U parameter, which, to my knowledge, tells it to actually check for updates and install the latest version.

That's all fine and dandy, but what happens when you want to install an "unofficial" version of some package? I mean, what if your favorite project all of the sudden includes some feature that you will die unless you can have access to it and the next official version is weeks or months in the future? There are typically a few avenues you can take to satisfy your needs, but I wanted to bring up something that I think not many people are aware of: easy_install and pip can both understand URLs to installable Python packages.

What do I mean by that, you ask? Well, when you get down to the basics of what both utilities do, they just take care of downloading some Python package and installing it with the setup.py file contained therein. In many cases, these utilities will search various package repositories, such as PyPI, to download whatever package you specify. If the package is found, it will be downloaded and extracted.

In most cases, you can do all of that yourself:

$ wget http://pypi.python.org/someproject/somepackage.tar.gz
$ tar zxf somepackage.tar.gz
$ cd somepackage
$ python setup.py install

Both easy_install and pip obviously do a lot of other magic, but that is perhaps the most basic way to understand what they do. To answer that last question, you can help your utility of choice out by specifying the exact URL to the specific package you want it to install for you:

$ easy_install http://pypi.python.org/someproject/somepackage.tar.gz
$ pip install http://pypi.python.org/someproject/somepackage.tar.gz

For me, this feature comes in very handy with projects that are hosted on BitBucket, for example, because you can always get any revision of the project in a tidy .tar.gz file. So when I'm updating Mercurial installations, I can do this to get the latest stable revision:

$ easy_install http://selenic.com/repo/hg-stable/archive/tip.tar.gz

It's pretty slick. Here's a full example:

[user@web ~]$ hg version
Mercurial Distributed SCM (version 1.2.1)

Copyright (C) 2005-2009 Matt Mackall <mpm@selenic.com> and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[user@web ~]$ easy_install http://selenic.com/repo/hg-stable/archive/tip.tar.gz
Downloading http://selenic.com/repo/hg-stable/archive/tip.tar.gz
Processing tip.tar.gz
Running Mercurial-stable-branch--8bce1e0d2801/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Gnk2c9/Mercurial-stable-branch--8bce1e0d2801/egg-dist-tmp--2VAce
zip_safe flag not set; analyzing archive contents...
mercurial.help: module references __file__
mercurial.templater: module references __file__
mercurial.extensions: module references __file__
mercurial.i18n: module references __file__
mercurial.lsprof: module references __file__
Removing mercurial unknown from easy-install.pth file
Adding mercurial 1.4.1-4-8bce1e0d2801 to easy-install.pth file
Installing hg script to /home/user/bin

Installed /home/user/lib/python2.5/mercurial-1.4.1_4_8bce1e0d2801-py2.5-linux-i686.egg
Processing dependencies for mercurial==1.4.1-4-8bce1e0d2801
Finished processing dependencies for mercurial==1.4.1-4-8bce1e0d2801
[user@web ~]$ hg version
Mercurial Distributed SCM (version 1.4.1+4-8bce1e0d2801)

Copyright (C) 2005-2009 Matt Mackall <mpm@selenic.com> and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Notice the version change from 1.2.1 to 1.4.1+4-8bce1e0d2801. w00t.

Edit: devov pointed out that pip is capable of installing packages directly from its repository. I've never used this functionality, but I'm interested in trying it out sometime! Thanks devov!

Mercurial 1.4.1 Released

I just noticed that Mercurial 1.4.1 was released today. Most of the changes are pretty minor, but I wanted to voice my appreciation for a new extension that is included with this release: schemes.

This extension basically makes your life easier by shortening redundant URLs for you. For example, you can now use the following command to snag my simple Mercurial extensions repo from BitBucket:

hg clone bb://codekoala/hgext

Without hgext.schemes, that command would be something like one of the following commands:

hg clone http://bitbucket.org/codekoala/hgext
hg clone ssh://hg@bitbucket.org/codekoala/hgext

Not the most ground-breaking of extensions, but still pretty slick!

Automatic Config Replication With Mercurial

I've done a lot of neat things since I started my new job earlier this month. I'm really excited about the things I've learned and experimented with, and I would like to share some of the concepts with my visitors.

At work we use a lot of virtual machines in our individual development environments. Most of these virtual machines use very similar configuration settings, but the settings are not a standard part of the installation. That is because we build our virtual machines using the same installation tools that our customers would use. The configuration I'm talking about is just stuff specific to our development environment.

Creating and configuring these virtual machines is one of the first things my mentor showed me how to do my first day on the job. He commented on how quickly I would probably start learning all of the configuration tasks because we tend to setup our development VMs several times a month. That was all fine and dandy, and I did get a pretty good feel for what needed to go into a development VM that first day.

However, after doing it so many times, I realized how much time I was using just trying to get the VM set up just right. It wasn't hard to configure--it was just time-consuming. It wasn't long before I started thinking of ways to optimize the process.

One of the ideas I came up with, which seems to be serving my purposes perfectly, is that of using Mercurial to quickly and easily get the exact same configuration from one box to another. It also has the added benefit of keeping a history of the changes I make to my configuration as time goes on.

I won't go into exact detail on how I have things setup at work, but I would like to try to describe a similar scenario that should illustrate my goal just as well.

Getting Started

One of the first things I would encourage you to do is follow along. It will make the concept sink in much faster, and you will probably see other applications very quickly. Please note, however, that if you're following along exactly, it could be a very time-consuming process. I will be using 3 virtual machines as I write this, but you could just as easily use 5, 10, or 100,000. Likewise, you could eliminate the virtual machines altogether if you're in an environment with several physical computers.

One virtual machine will act as the "master" server, or the one that will be configured first. The other virtual machines will act as "slave" servers, which will simply receive configuration updates that happen on the master server. We will also modify this behavior to be a bit more interesting toward the end of the article.

Virtual Machines Galore!

First off, I will create some basic virtual machines using the net install version of Debian 5.0.3. I really only need to create 1 VM and then clone it a couple of times. I am willing to furnish my virtual machines to those who are interested in using them. I will install some additional software in the VM to make sure the demo works smoothly. Among the packages that I will install are:

  • Python
  • Mercurial
  • OpenSSH server

Initialize a Repository

Once I have all of that set up in my virtual machines, I will initialize a Mercurial repository on the master server to maintain the configuration files that I am interested in. Let's just use the /etc directory for the time being. There's a pretty good chance that most of our system-wide configuration will all be contained somewhere beneath /etc.

cd /etc
hg init

Now let's have a gander at the files that we can have Mercurial manage for us:

hg st

Wow! That is quite a set of files, isn't it? Thankfully, they should mostly be plain text files. Mercurial is very efficient at managing text files. Let's now add all of the files in /etc to our repository, so they can be tracked and easily pushed out to other systems.

hg add

That command will happily add everything that hg st printed. Obviously, we can get a little more picky about what we do and do not add to our repository, but that's not the goal of this article. Now, this step merely tells Mercurial that it needs to pay attention to changes in these files. The files have not yet been committed to the repo. Let's do that, so we have a backup of our configuration files in their pristine state:

hg ci -m "Initial import"

The -m "Initial import" is just a comment, to describe what happened to warrant a commit to the repository. It is for your use and the use of anyone who has access to your repo.

Clone The Configuration

Now let's try to push the configuration we just committed on the master server to one of the slave servers. Since my virtual machines are all essentially in the same state, there should be no conflicts, right? Try running the following command on the master server:

hg push ssh://root@slave1//etc
root@slave1's password:
remote: abort: There is no Mercurial repository here (.hg not found)!
abort: no suitable response from remote hg!

Blast! We can't simply push the configuration files out to another computer. For that to work, we'd first have to have the repository itself exist on the slave server. Let's try this another way. One the slave server, run this command:

hg clone ssh://root@master//etc /etc
root@master's password:
abort: destination '/etc/' is not empty

Doh! Mercurial won't let us clone the repository from the master server! That's because Mercurial wants to clone to a new directory, with nothing already in it. One way to get around this hairball of a show-stopper is to just copy the repo using conventional UNIX utilities. Execute this command on one of your slave servers:

scp -r root@master:/etc/.hg /etc/

The .hg directory contains all of the repository information, and it's really all we need to snag in order to clone the repository. This might not be the most elegant solution in the world, but it will suffice for the time being. Once the scp command completes, we should have a full copy of the configuration file repository. Run this command to verify:

hg st

If your setup is anything like mine, you'll probably have a few files that are listed as being modified. Chances are that these files will vary from host to host anyway, and they are probably not worth keeping in a version control system. That would just be begging for conflicts.

I wrote an extension for Mercurial that should make this part of my tutorial a little less hacky. On your other slave server, run the following commands:

hg clone http://bitbucket.org/codekoala/hgext /root/hgext
echo "[extensions]" >> /root/.hgrc
echo "neclone = /root/hgext/neclone.py" >> /root/.hgrc

This extension gives you a new Mercurial command called neclone (N. E. Clone, or "not empty clone"). As we saw earlier, Mercurial doesn't let us clone a repository into a directory that is not empty. This extension allows us to do that. It works almost identically to the regular clone command... takes the same options and everything.

Still on your second slave server, run these additional commands:

hg neclone ssh://root@master//etc /etc
cd /etc
hg up -C

The last step is optional, and soon to be included as part of the extension. It will update your working copy to the latest revision in the repository. Beware that it overwrites any uncommitted changes you may have made to files that are tracked by Mercurial.

So now both slave servers should have a clone of the configuration repository from the master server.

Being Picky

Let's start to be a little picky about the files we are tracking in our repository. Some of the files appears as being modified on my slave server after copying the .hg directory from the master server are:

  • adjtime
  • alternatives/pager
  • alternatives/pager.1.gz
  • mailcap
  • network/run/ifstate
  • udev/rules.d/70-persistent-net.rules

I think it's safe to remove these from the repository, to avoid conflicts with other systems. To tell Mercurial to stop tracking files it is tracking, without actually deleting the file from the filesystem, you can use the following command:

hg forget adjtime
hg forget mailcap

And so on. Go ahead and do that for each of the files that appeared to be modified on your slave server immediately after copying the .hg directory. I'm going to add /etc/hostname to the list of files to forget too.

After doing that, each of those files should appear as being marked for removal when you run hg st. Don't worry, this is normal. The files will not be deleted from the filesystem, but they will be deleted from the repository. Go ahead and commit those changes to the repository on your slave server.

hg ci -Am "Removed some files from version control"

Now let's push those changes out to the master server:

hg push
abort: repository default-push not found!

Since we copied the .hg directory directly using scp, our slave won't know where the changes need to go when we run the push command with no explicit destination repository. To fix that, let's create a file in /etc/.hg/ called hgrc on the slave server. In that file, put the following text:

[paths]
default = ssh://root@master//etc

The hg push command should now push directly to the master server. Yay! The problem we face now is that every other slave server in the group is out of date. How can we fix that? We'll use Mercurial hooks.

Automating Config Replication

Mercurial offers some very useful hooks that we can use to automatically push configuration changes out to each of our slave servers. We will use the commit and changegroup hooks to do the magic. Let's create a script that will live on the master server to take care of pushing our changes out to each slave server. Create a new file in /etc/ on the master server called propagate.sh:

#!/bin/bash
hg up
for node in 'slave1' 'slave2'
do
    ssh root@$node "cd /etc; hg pull -u"
done

Let's also make sure this script is executable:

chmod +x /etc/propagate.sh

This script assumes that your /etc/hosts file or your nameserver are configured appropriately to allow slave1 and slave2 to be resolved to IP addresses. The reason we're SSH'ing into each slave server and using hg pull instead of simply using hg push ssh://root@$node//etc is because you can't force an update on a remote server using push. You can, however, request an update when you're using pull.

Obviously, this script is not the most sophisticated of scripts. It might work well for my demonstration, with only a few servers, but once you get beyond that it would be a nightmare to maintain the list of servers the script has to connect to. You can use whatever means you'd like to keep track of the servers you want to replicate your configuration to. I don't want to bother with all of the crap I'd get for suggesting one thing over another, so it's now your call.

Now it's time to configure the Mercurial hook to execute that script when the master server sees a changeset get into its repository. Open up /etc/.hg/hgrc on the master server, or create it if it doesn't exist. Make sure it has at least the following in it:

[hooks]
commit.propagate = /etc/propagate.sh
changegroup.propagate = /etc/propagate.sh

Let's try it out! Run these commands on your master server:

echo "" >> /etc/hosts
hg ci -m "Added a blank line to the hosts file"
root@slave1's password:
remote: Permission denied, please try again.
remote: Permission denied, please try again.
remote: Permission denied (publickey,password).
abort: no suitable response from remote hg!
Connection closed by slave2
warning: commit.propagate hook exited with status 255

Blast! The script failed because it wanted us to type in a password, but it was not in interactive mode. Let's fix that with a little preshared key magic. I won't go into the details about how this works, but the following commands on your master server should get us rolling:

ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
scp -r ~/.ssh root@slave1:~
scp -r ~/.ssh root@slave2:~

Warning

Keep in mind this is not secure and should probably not be how your production machines are configured, especially with the root user.

For simplicity's sake, just accept all of the details and don't set a passphrase. These commands enable us to SSH into our slave servers without using a password. If you get an error such as:

remote: Host key verification failed.
abort: no suitable response from remote hg!

...it just means you need to manually log into your master server from the slave machine that threw that error. When doing so, you will have to answer "yes" to a question about the authenticity of the host you're logging into.

Testing It Out

It is now time to see if we can make a configuration change on one slave server and have it show up on the other slave server. Let's update the hosts file a little bit. Let's add the following line on the second slave server:

10.0.0.5        nonexistanthost

Now let's commit the change and push it off to the master server:

hg ci -m "Added a dumb line to the hosts file"
hg push

My system actually told me that that it had copied the change out to another host. I know because I saw these lines:

remote: pulling from ssh://root@master//etc
remote: searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Now when I look at the first slave server, I should see that new line in my /etc/hosts file. Also, the log on each server should have the same entry that I just made about adding "a dumb line to the hosts file."

Seem Like A Lot of Work?

A lot of what we just did probably seemed like more work that it is worth, right? Well, being a nerd typically comes with a few qualities. One quality which I have observed many a time in my most geeky of friends is that they will spend hours and hours up front on a program or script just so they can save 2 minutes in the future. They work hard to be lazy.

There is a lot of boilerplate configuration that takes place in this particular scenario. I realize that. What I haven't shared with you, though, is how I automated the boilerplate configuration as well as the propagation of configuration. I'm tired of putting this article off, so I will have to leave those details for another article. Sorry!

Why?! There's a Better Way (tm)

There is always a better way. Always. Go ahead and use whatever you feel is the most efficient method for keeping configuration files in sync across several computers. This is just one more option to add to your toolkit. Don't worry, I won't be offended if you don't like it or don't use it. It works perfect for me and it's free, and I just wanted to share!

Bulk Update With Mercurial

Some of you may well know that I was previously an subversion user, more out of comfort than necessity. SVN was the first version control system that I became well acquainted with, so it just seemed like a natural choice for me when I thought I needed version control.

Several months ago I read a blog article by a buddy, in which he briefly discussed Mercurial. I had been meaning to give some distributed version control systems a shot after some disasters related to the centralized nature of SVN. This blog article prompted me to take a stab at Mercurial and some others.

Within a few days I was sold on Mercurial. I won't go into details simply because I'm not one for religious wars that way. Let's just say that Mercurial seemed to be perfect for my wants and needs.

There were, however, a few things about using Mercurial that I miss from the SVN world. One such thing is that you can update several "working copies" of something in SVN with a single command. For example, I keep a lot of my 3rd party Django applications in one directory. Many of these applications use SVN. Sometimes I'll just run a command like this:

svn up /path/to/third/party/apps/*

Each project that uses SVN will automatically be updated without much fuss with such a command. However, with Mercurial, it appears that you need to be in an actual Mercurial repository in order to update it. There are extensions to get around this problem, but I was looking for something a little different.

Since I use Linux almost exclusively, I didn't feel bad about just using the power within to do the work. The following command does everything I need it to:

find -name ".hg" -type d | xargs -t -i bash -c "(cd {}; hg pull; hg up)"

This command finds any directories called .hg anywhere under your current location on the filesystem. Any matches will be used in the command at the end: cd {}; hg pull; hg up

So far I haven't had any problems with this command, but your mileage may vary. To make things even easier, I made an alias for this rather long command:

alias hgupall='find -name ".hg" -type d | xargs -t -i bash -c "(cd {}; hg pull; hg up)"'

I put that line in my ~/.bashrc script, which is executed each time I log into my computer. With that in place, all I need to do is something like this:

cd /path/to/third/party/apps
hgupall

And the aliased command handles the rest. Pretty slick stuff. Hooray for Mercurial and Linux!

Mercurial 1.3 Released

Today marks the official release of Mercurial 1.3, an awesome distributed version control system. This release comes with several nifty features, including the following, straight from the What's New wiki page:

Major Changes

  • experimental support for sub-repositories
  • Python 2.3 is no longer supported; now requires Python 2.4-2.6

Commands

  • merge: add -P/--preview option
  • update: don't unlink added files when -C/--clean is specified
  • update: added -c/--check option to abort on local changes
  • update: allow merges going backwards
  • push: improved handling of named branches
  • branches/heads: add a -c/--closed option to show closed branches
  • help: new extensions topic

General

  • add patch.eol config setting to work with cross-platform patches
  • fixed support for SSL through proxies
  • add ability to load hooks from arbitrary Python modules
  • hide passwords for HTTP repositories in error and log output
  • fix Python 2.6 support in the Windows installer
  • add mechanism for specifying HTTP authentication details in hgrc
  • prompts and choices are now shown even in non-interactive mode
  • performance improvements, especially on Windows
  • much improved zsh completion
  • improved Danish, Japanese, Italian and simplified Chinese translations
  • new German, French, Greek, Brazilian Portuguese and traditional Chinese translations

Web interface

  • read configuration data from webdir configs
  • add branches page to hgweb
  • pluggable templater engine support
  • refresh hgwebdir configuration periodically
  • let web.encoding override ui.encoding setting
  • deal with dicts/lists like webdir config paths

I'm quite stoked about this release :) For additional information, please check the project's wiki.

Google Code + Mercurial = Many Happies

Last night I noticed that Google Code is actually offering the Mercurial project hosting that they promised back in April. I guess it's been around for most of May, but I never saw any news to suggest that it was actually public. As soon as I noticed it, I converted one of my less-known, less-used SVN projects to Mercurial. I'm really liking it.

I need to do a bit more work on this particular project before I announce it to the world, but it's out there, and it's Mercurial powered now babay. I think I will be leaving most of my other projects in SVN so I don't upset all of the other people who actually use them.

Oh, I also noticed that the project quotas were bumped up quite a bit. Now each project seems to get a whopping 1GB of space for free!!! What do you have to say about that, BitBucket/GitHub/Assembla/[insert dirty, rotten free open source project hosting host name here]?!

Hooray for Google Code!

Groovy One-Liner

It's been a while since I wrote a blog article, so I'm using this one-liner as an excuse. In case you're new here, I do a lot of Python development. In the world of Python, you need to have a special file in a directory before you can use Python code within that directory. Yeah, yeah... that's not exactly the clearest way to explain things, but it'll have to do.

This special file is called __init__.py. Having this file in a directory that contains Python code turns that directory into what's called a "python package." We like Python packages. They make our lives so much fun!

Anyhoo, I was working on a project last night, and I wanted to create a bunch of placeholder directories that I plan to use later on. I plan on keeping Python code in these directories, so putting the special __init__.py file in them is what I was looking to do. I didn't want to have to create the __init__.py file in each directory manually, or copy/paste the file all over the place, so I investigated a way to do it quickly from the command line.

One of my buddies brought an interesting command to my attention recently: xargs. I had seen it before in various tutorials online, but I never bothered to learn about it. This seemed like as good a time as any, so I started playing. The result of my efforts follows:

find . -type d | xargs -I {} touch {}/__init__.py

What it does is:

  • recursively finds (find) all directories (-type d) within the current directory (.)
  • pipes (|) each directory to xargs, which makes sure that the __init__.py file exists in each one (touch {}/__init__.py)
  • the -I {} tells xargs what to use as a placeholder when considering each directory found by the find command

Turns out that xargs can be used for all sorts of good stuff. My friend brought it up as a way to get rid of those nasty .svn directories on his path to "Mercurial bliss."

find . -name ".svn" -type d | xargs -I {} rm -Rf {}

How beautiful!

Slackware 12.1 on an Asus EeePC 701

Attention!

This article has a follow-up for Slackware 12.2.

The following are the steps I took to install Slackware 12.1 on my EeePC this past weekend. I hope you find them complete and helpful!

Installing Slackware 12.1 on an Asus EeePC 701

  1. Burn DVD .iso to disc
  2. Turn on EeePC
  3. Hit F2 to run setup
  4. Go to the Advanced tab, and set "OS Installation" to "Start"
  5. Go to the Boot tab, and ensure that the external DVD drive will be used for booting before the internal SSD
  6. Exit and save changes
  7. Just hit enter after rebooting from BIOS configuration when the Slackware boot screen shows up
  8. Unless you want to use a different keymap for whatever reason, hit enter when asked to select a keyboard map
  9. Login as root
  10. Run fdisk or cfdisk on /dev/hdc
  11. Remove all partitions (unless you know what you're doing)
    1. fdisk: d to delete (you may have to select multiple partitions to delete if you have more than one for some reason)
    2. cfdisk: Select all partitions individually with up/down arrow keys and use the left/right arrow keys to select delete from the menu at the bottom. Hit enter to run the delete command when it's highlighted.
  12. Create one partition that takes the whole SSD (again, unless you know what you're doing)
    1. fdisk: n (for new); enter; p (for primary); enter; 1 (for the first primary partition); enter; enter (to start at the beginning of the drive); enter (to select the end of the drive)
    2. cfdisk: Select the new command with the left/right arrow keys and hit enter when it's selected. Make it a primary parition, and have it take the whole SSD (3997.49MB in my case).
  13. Set the type of the new partition to be Linux
    1. fdisk: t (for type); enter; 83 (for Linux); enter
    2. cfdisk: Use the left/right arrow keys to select the type command at the bottom and hit enter when it's selected. Choose 83.
  14. Set the new partition (or the first, if you decided to make more than one) to be bootable
    1. fdisk: a (for bootable); enter; 1 (for primary partition 1); enter
    2. cfdisk: Select the bootable command from the bottom using the left/right arrow keys. Hit enter when it's selected.
  15. Write the changes to the partition table and quit
    1. fdisk: w
    2. cfdisk: Use the left/right arrow keys to select the write command from the bottom. Hit enter when it's selected. Type 'yes' to verify your intent, acknowledging that your previous data will be "gone". Then select the quit command.
  16. Run setup
  17. Select TARGET to specify where you will be installing
  18. Select /dev/hdc1
  19. Format the partition
  20. To reduce write cycles, many people suggest formatting with ext2, which is a non-journaling filesystem. However, many people claim that the limited number write cycles of SSD is not something to worry about. Use your best judgement on this one. Hit OK after the format is complete.
  21. Select where you plan to install Slackware from. In my case, it's the DVD. I usually tell it to find the media automatically. Select manual if you know which device your DVD drive is. Mine was /dev/sr0.
  22. Select the packages you wish to install. This is where your installation will likely differ greatly from mine because of personal preferences. I do a lot of development, so I will keep a lot of things for that. Here's what I selected to install:
    1. Base Linux System
    2. Various Applications that do not need X
    3. Program Development (C, C++, Lisp, Perl, etc.)
    4. Linux kernel source
    5. Qt and the K Desktop Environment for X
    6. System Libraries (needed by KDE, GNOME, X, and more)
    7. Networking (TCP/IP, UUCP, Mail, News)
    8. Tcl/Tk script languages
    9. X Window System
    10. X Applications
    11. Games
  23. Choose whether or not you want to be picky about your software. To save a little extra disk space, I'm going to manually choose what I don't want. This includes:
    1. A: cpio, cryptsetup, cups, floppy, genpower, jfsutils, mdadm, mt-st, mtx, quota, reiserfsprogs, rpm2tgz, tcsh, xfsprogs
    2. AP: amp, cdparanoia, hplip, gutenprint, jed, joe, jove, ksh93, mysql, rpm, xfsdump, zsh
    3. D: gcc-gfortran, gcc-gnat, gcc-java, mercurial, p2c
    4. N: elm, epic4, httpd, mailx, mutt, netatalk, pine, popa3d, proftpd, rp-pppoe, samba, slrn, tin, trn, vsftpd
    5. TCL: hfsutils
    6. X: anthy, bdftopcf, beforelight, libhangul, sazanami-fonts-ttf, sinhala_lklug-font-ttf, tibmachuni-font-ttf, wqy-zenhei-font-ttf
    7. XAP: audacious, audacious-plugins, gftp, mozilla-thunderbird, pan, seamonkey
  24. Wait for the installation to complete. It took almost a full hour with my package selection, leaving me with 485.4MB free on my 4GB SSD.
  25. Choose whether or not you want to make a bootable USB... I skipped it.
  26. Choose how you wish to install LILO. I chose simple.
  27. Choose your frame buffer mode for the console. I chose 640x480x256.
  28. Specify any optional kernal parameters. I left this blank, originally, but later learned that having 'hdc=noprobe' increased my disk access speed by about 13 times.
  29. Specify whether you wish to use UTF-8 on the console. I chose no.
  30. Specify where to install LILO. I chose MBR.
  31. Specify your mouse type. I chose imps2.
  32. Specify whether or not you wish to have gpm run at boot, which allows you to use your mouse in the console. I chose yes.
  33. Configure your network.
  34. Give your eeepc a hostname. This can be whatever you'd like.
  35. Specify the domain for your network. This can be whatever you'd like as well.
  36. Configure your IP address information. I just chose DHCP.
  37. Set the DHCP hostname. I left this blank.
  38. Review and confirm your network settings.
  39. Choose which services you wish to have running immediately after booting.
  40. See if you want to try custom screen fonts. I usually don't bother.
  41. Specify whether your hardware clock is set to local time or UTC.
  42. Choose your timezone.
  43. Select your preferred window manager. I chose KDE.
  44. Set the root password.
  45. Slackware has been installed! Exit the setup program and reboot.
  46. Hit F2 to enter the BIOS again.
  47. Set OS Installation to "Finished" and exit the BIOS, saving changes.
  48. Reboot into Slackware! The first boot takes a while because of all the initial setup. It is faster on subsequent reboots, assuming you don't add new services (like apache and mysql) at boot.

Change a few settings around.

  1. vi /etc/inittab
  2. (set default runlevel to 4)
  3. vi /etc/lilo.conf
  4. add 'compact' somewhere to make it boot faster
  5. change the boot delay so it's not 120 seconds

Now for installing various drivers.

  1. Install the ethernet driver: http://people.redhat.com/csnook/atl2/atl2-2.0.4.tar.bz2
    1. wget http://people.redhat.com/csnook/atl2/atl2-2.0.4.tar.bz2
    2. tar jxf atl2-2.0.4.tar.bz2
    3. cd atl2-2.0.4
    4. make
    5. cp atl2.ko /lib/modules/2.6.24.5-smp/kernel/drivers/net/
    6. depmod -a
    7. modprobe atl2
    8. ifconfig
  2. Install the drivers for the wireless: http://snapshots.madwifi.org/special/madwifi-nr-r3366+ar5007.tar.gz
    1. wget http://snapshots.madwifi.org/special/madwifi-nr-r3366+ar5007.tar.gz
    2. tar zxvf madwifi-nr-r3366+ar5007.tar.gz
    3. cd madwifi-nr-r3366+ar5007.tar.gz
    4. scripts/madwifi-unload
    5. scripts/find-madwifi-modules.sh uname -r
    6. make && make install
    7. modprobe ath_pci

I kind of stopped taking notes after I realized how much fun it was to have Slackware on my EeePC. If you have questions, just add a comment below.