Arduino-Powered Webcam Mount

Josh VanderLinden

2011-09-22 03:36

Earlier this month, I completed yet another journey around the biggest star in our galaxy. Some of my beloved family members thought this would be a good occasion to send me some cash, and I also got a gift card for being plain awesome at work. Even though we really do need a bigger car and whatnot, my wife insisted that I only spend this money on myself and whatever I wanted.

Little did she know the can of worms she just opened up.

I took pretty much all of the money and blew it on stuff for my electronics projects. Up to this point, my projects have all been pretty boring simply because nothing ever moved--it was mostly just lights turning on and off or changing colors. Sure, that's fun, but things really start to get interesting when you actually interact with the physical world. With the birthday money, I was finally able to buy a bunch of servos to begin living out my childhood dream of building robots.

My first project since getting all of my new toys was a motorized webcam mount. My parents bought me a Logitech C910 for my birthday because they were tired of trying to see their grandchildren with the crappy webcam that is built into my laptop. It was a perfect opportunity to use SparkFun's tutorial for some facial tracking (thanks to OpenCV) using their Pan/Tilt Servo Bracket.

It took a little while to get everything setup properly, but SparkFun's tutorial explains perfectly how you can get everything setup if you want to repeat this project.

The problem I had with the SparkFun tutorial, though, is that it basically only gives you a standalone program that does the facial tracking and displays your webcam feed. What good is that? I actually wanted to use this rig to chat with people!! That's when I set out to figure out how to do this.

While the Processing sketch ran absolutely perfect on Windows, it didn't want to work on my Arch Linux system due to some missing dependencies that I didn't know how/care to satisfy. As such, I opted to rewrite the sketch using Python so I could do the facial tracking in Linux.

This is still a work in progress, but here's the current facial tracking program which tells the Arduino where the webcam should be pointing, along with the Arduino sketch.

Now that I could track a face and move my webcam in Linux, I still faced the same problem as before: how can I use my face-tracking, webcam-moving program during a chat with my mom? I had no idea how to accomplish this. I figured I would have to either intercept the webcam feed as it was going to Skype or the Google Talk Plugin, or I'd have to somehow consume the webcam feed and proxy it back out as a V4L2 device that the Google Talk Plugin could then use.

Trying to come up with a way of doing that seemed rather impossible (at least in straight Python), but I eventually stumbled upon a couple little gems.

So the GStreamer tutorial walks you step-by-step through different ways of using a gst-launch utility, and I found this information very useful. I learned that you can use tee to split a webcam feed and do two different things with it. I wondered if it would be possible to split one webcam feed and send it to two other V4L2 devices.

Enter v4l2loopback.

I was able to install this module from Arch's AUR, and using it was super easy (you should be root for this):

modprobe v4l2loopback devices=2

This created two new /dev/video* devices on my system, which happened to be /dev/video4 and /dev/video5 (yeah... been playing with a lot of webcams and whatnot). One device, video4, is for consumption by my face-tracking program. The other, video5, is for VLC, Skype, Google+ Hangouts, etc. After creating those devices, I simply ran the following command as a regular user:

gst-launch-0.10 v4l2src device=/dev/video1 ! \
    'video/x-raw-yuv,width=640,height=480,framerate=30/1' ! \
    tee name=t_vid ! queue ! \
    v4l2sink sync=false device=/dev/video4 t_vid. ! \
    queue ! videorate ! 'video/x-raw-yuv,framerate=30/1' ! \
    v4l2sink device=/dev/video5

There's a whole lot of stuff going on in that command that I honestly do not understand. All I know is that it made it so both my face-tracking Python program AND VLC can consume the same video feed via two different V4L2 devices! A co-worker of mine agreed to have a quick Google+ Hangout with me to test this setup under "real" circumstances (thx man). It worked :D Objective reached!

I had really hoped to find a way to handle this stuff inside Python, but I have to admit that this is a pretty slick setup. A lot of things are still hardcoded, but I do plan on making things a little more generic soon enough.

So here's my little rig (why yes, I did mount it on top of an old Kool-Aid powder thingy lid):

And a video of it in action. Please excuse the subject of the webcam video, I'm not sure where that guy came from or why he's playing with my webcam.

Python And Execution Context

Josh VanderLinden

2010-03-10 17:38

Comments

I recently found myself in a situation where knowing the execution context of a function became necessary. It took me several hours to learn about this functionality, despite many cleverly-crafted Google searches. So, being the generous person I am, I want to share my findings.

My particular use case required that a function behave differently depending on whether it was called in an exec call. Specifics beyond that are not important for this article. Here's an example of how I was able to get my desired behavior.

import inspect

def is_exec():
    caller = inspect.currentframe().f_back
    module = inspect.getmodule(caller)

    if module is None:
        print "I'm being run by exec!"
    else:
        print "I'm being run by %s" % module.__name__

def main():
    is_exec()

    exec "is_exec()"

if __name__ == '__main__':
    main()

The output of such a script would look like this:

$ python is_exec.py
I'm being run by __main__
I'm being run by exec!

It's also interesting to note that when you're using the Python interactive interpreter, calling the is_exec function from the code above will tell you that you are indeed using exec.

Some may argue that modifying behavior as I needed to is dirty, and that if your system requires such code, you're doing it wrong. Well, you could apply this sort of code to situations that have nothing to do with exec. Perhaps you want to determine which part of your product is using a specific function the most. Perhaps you want to get additional debugging information that isn't immediately obvious.

Just like always, I want to add the disclaimer that there may be other ways to do this and there probably are. However, this is the way that worked for me. I'd still be interested to here about other solutions you may have encountered for this problem.

On a side note, if you're up for some slightly advanced Python mumbo jumbo, I suggest diving into the inspect documentation.

Network Manager, Cisco VPN, And Internet

Josh VanderLinden

2010-02-06 11:00

Comments

Those of us out on the eastern side of the United States are currently experiencing quite a snow storm. While this sort of storm would probably have not even made the local news in Rexburg (where my wife and I attended university), everyone is making a big deal about it around here. Part of that big deal included the option, and even recommendation, that we work from home on Friday, using the company VPN to take care of our tasks.

I was pretty excited at the idea of working from home once again (my last job was almost exclusively a work-at-home gig), so I made sure I was able to connect to the VPN a few days ago, after receiving the credentials. It took a few tries to get everything right in Windows, but eventually it started working quite well. Then I tried connecting from Linux, using the awesomeness known as Network Manager.

Since I'm currently on Fedora 12, all I had to do was make sure that I had network-manager-vpnc installed, and I could then configure a connection using the same credentials I used in Windows. I had a successful connection on the very first try, and it was working fabulously. I had access to all of my development machines and all of the tools I use on a daily basis.

It didn't take long, however, for me to notice a big problem: no Internet access. I could get to any machine I dang well pleased on the company network, but nothing on the Internet. Quite frustrating, to say the least.

I decided to leave the investigation as to why I had no Internet access and how to fix it for another night. Here I am now, tinkering with it again. I found out what I needed to change:

Right click on the Network Manager icon in the system tray, and select "Edit Connections..."
Click on the VPN tab
Edit your VPN connection
Click on the "IPv4 Settings" tab
Click the "Routes..." button
Make sure that the "Use this connection only for resources on its network" option is checked
Connect to your VPN, and enjoy access to the devices there as well as on the Internet!

Hopefully this saves someone else's sanity (Jeremy?)

Site-Wide Caching in Django

Josh VanderLinden

2010-02-03 17:35

Comments

My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

memcached
database (cached objects are stored in your database)
filesystem
local memory

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.

Auto-Generating Documentation Using Mercurial, ReST, and Sphinx

Josh VanderLinden

2010-01-29 23:32

Comments

I often find myself taking notes about various aspects of my job that I feel I would forget as soon as I moved onto another project. I've gotten into the habit of taking my notes using reStructured Text, which shouldn't come as any surprise to any of my regular visitors. On several occasions, I had some of the other guys in the company ask me for some clarification on some things I had taken notes on. Lucky for me, I had taken some nice notes!

However, these individuals probably wouldn't appreciate reading ReST markup as much as I do, so I decided to do something nice for them. I setup Sphinx to prettify my documentation. I then wrote a small Web server using Python, so people within the company network could access the latest version of my notes without much hassle.

Just like I take notes to remind myself of stuff at work, I want to do that again for this automated ReST->HTML magic--I want to be able to do this in the future! I figured I would make my notes even more public this time, so you all can enjoy similar bliss.

Platform Dependence

I am writing this article with UNIX-like operating systems in mind. Please forgive me if you're a Windows user and some of this is not consistent with what you're seeing. Perhaps one day I'll try to set this sort of thing up on Windows.

Installing Sphinx

The first step that we want to take is installing Sphinx. This is the project that Python itself uses to generate its online documentation. It's pretty dang awesome. Feel free to skip this section if you have already installed Sphinx.

Depending on your environment of choice, you may or may not have a package manager that offers python-sphinx or something along those lines. I personally prefer to install it using pip or easy_install:

$ sudo pip install sphinx

Running that command will likely respond with a bunch of output about downloading Sphinx and various dependencies. When I ran it in my sandbox VM, I saw it install the following packages:

pygments
jinja2
docutils
sphinx

It should be a pretty speedy installation.

Installing Mercurial

We'll be using Mercurial to keep track of changes to our ReST documentation. Mercurial is a distributed version control system that is built using Python. It's wonderful! Just like with Sphinx, if you have already installed Mercurial, feel free to skip to the next section.

I personally prefer to install Mercurial using pip or easy_install--it's usually more up-to-date than what you would have in your package repositories. To do that, simply run a command such as the following:

$ sudo pip install mercurial

This will go out and download and install the latest stable Mercurial. You may need python-dev or something like that for your platform in order for that command to work. However, if you're on Windows, I highly recommend TortoiseHg. The installer for TortoiseHg will install a graphical Mercurial client along with the command line tools.

Create A Repository

Now let's create a brand new Mercurial repository to house our notes/documentation. Open a terminal/console/command prompt to the location of your choice on your computer and execute the following commands:

$ hg init mydox
$ cd mydox

Configure Sphinx

The next step is to configure Sphinx for our project. Sphinx makes this very simple:

$ sphinx-quickstart

This is a wizard that will walk you through the configuration process for your project. It's pretty safe to accept the defaults, in my opinion. Here's the output of my wizard:

$ sphinx-quickstart
Welcome to the Sphinx quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Enter the root path for documentation.
> Root path for the documentation [.]:

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/N) [n]: y

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:

The project name will occur in several places in the built documentation.
> Project name: My Dox
> Author name(s): Josh VanderLinden

Sphinx has the notion of a "version" and a "release" for the
software. Each version can have multiple releases. For example, for
Python the version is something like 2.5 or 3.0, while the release is
something like 2.5.1 or 3.0a1.  If you don't need this dual structure,
just set both to the same value.
> Project version: 0.0.1
> Project release [0.0.1]:

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:

Please indicate if you want to use one of the following Sphinx extensions:
> autodoc: automatically insert docstrings from modules (y/N) [n]:
> doctest: automatically test code snippets in doctest blocks (y/N) [n]:
> intersphinx: link between Sphinx documentation of different projects (y/N) [n]:
> todo: write "todo" entries that can be shown or hidden on build (y/N) [n]:
> coverage: checks for documentation coverage (y/N) [n]:
> pngmath: include math, rendered as PNG images (y/N) [n]:
> jsmath: include math, rendered in the browser by JSMath (y/N) [n]:
> ifconfig: conditional inclusion of content based on config values (y/N) [n]:

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (Y/n) [y]:
> Create Windows command file? (Y/n) [y]: n

Finished: An initial directory structure has been created.

You should now populate your master file ./source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

If you followed the same steps I did (I separated the source and build directories), you should see three new files in your mydox repository:

build/
Makefile
source/

We'll do our work in the source directory.

Get Some ReST

Now is the time when we start writing some ReST that we want to turn into HTML using Sphinx. Open some file, like first_doc.rst and put some ReST in it. If nothing comes to mind, or you're not familiar with ReST syntax, try the following:

=========================
This Is My First Document
=========================

Yes, this is my first document.  It's lame.  Deal with it.

Save the file (keep in mind that it should be within the source directory if you used the same settings I did). Now it's time to add it to the list of files that Mercurial will pay attention to. While we're at it, let's add the other files that were created by the Sphinx configuration wizard:

$ hg add
adding ../Makefile
adding conf.py
adding first_doc.rst
adding index.rst
$ hg st
A Makefile
A source/conf.py
A source/first_doc.py
A source/index.rst

Don't worry that we don't see all of the directories in the output of hg st--Mercurial tracks files, not directories.

Automate HTML-ization

Here comes the magic in automating the conversion from ReST to HTML: Mercurial hooks. We will use the precommit hook to fire off a command that tells Sphinx to translate our ReST markup into HTML.

Edit your mydox/.hg/hgrc file. If the file does not yet exist, go ahead and create it. Add the following content to it:

[hooks]
precommit.sphinxify = ~/bin/sphinxify_docs.sh

I've opted to call a Bash script instead of using an inline Python call. Now let's create the Bash script, ~/bin/sphinxify_docs.sh:

#!/bin/bash
cd $HOME/mydox
sphinx-build source/ docs/

Notice that I used the $HOME environment variable. This means that I created the mydox directory at /home/myusername/mydox. Adjust that line according to your setup. You'll probably also want to make that script executable:

$ chmod +x ~/bin/sphinxify_docs.sh

Three, Two, One...

You should now be at a stage where you can safely commit changes to your repository and have Sphinx build your HTML documentation. Execute the following command somewhere under your mydox repository:

$ hg ci -m "Initial commit"

If your setup is anything like mine, you should see some output similar to this:

$ hg ci -m "Initial commit"
Making output directory...
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... not found
building [html]: targets for 2 source files that are out of date
updating environment: 2 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... /home/jvanderlinden/mydox/source/first_doc.rst:: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded, 1 warning.
$ hg st
? docs/.buildinfo
? docs/.doctrees/environment.pickle
? docs/.doctrees/first_doc.doctree
? docs/.doctrees/index.doctree
? docs/_sources/first_doc.txt
? docs/_sources/index.txt
? docs/_static/basic.css
? docs/_static/default.css
? docs/_static/doctools.js
? docs/_static/file.png
? docs/_static/jquery.js
? docs/_static/minus.png
? docs/_static/plus.png
? docs/_static/pygments.css
? docs/_static/searchtools.js
? docs/first_doc.html
? docs/genindex.html
? docs/index.html
? docs/objects.inv
? docs/search.html
? docs/searchindex.js

If you see something like that, you're in good shape. Go ahead and take a look at your new mydox/docs/index.html file in the Web browser of your choosing.

Not very exciting, is it? Notice how your first_doc.rst doesn't appear anywhere on that page? That's because we didn't tell Sphinx to put it there. Let's do that now.

Customizing Things

Edit the mydox/source/index.rst file that was created during Sphinx configuration. In the section that starts with .. toctree::, let's tell Sphinx to include everything we ReST-ify:

.. toctree::
   :maxdepth: 2
   :glob:

   *

That should do it. Now, I don't know about you, but I don't really want to include the output HTML, images, CSS, JS, or anything in my documentation repository. It would just take up more space each time we change an .rst file. Let's tell Mercurial to not pay attention to the output HTML--it'll just be static and always up-to-date on our filesystem.

Create a new file called mydox/.hgignore. In this file, put the following content:

syntax: glob
docs/

Save the file, and you should now see something like the following when running hg st:

$ hg st
M source/index.rst
? .hgignore

Let's include the .hgignore file in the list of files that Mercurial will track:

$ hg add .hgignore
$ hg st
M source/index.rst
A .hgignore

Finally, let's commit one more time:

$ hg ci -m "Updating the index to include our .rst files"
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... done
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded.

Tada!! The first_doc.rst should now appear on the index page.

Serving Your Documentation

Who seriously wants to have HTML files that are hard to get to? How can we make it easier to access those HTML files? Perhaps we can create a simple static file Web server? That might sound difficult, but it's really not--not when you have access to Python!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

def main():
    try:
        server = HTTPServer(('', 80), SimpleHTTPRequestHandler)
        server.serve_forever()
    except KeyboardInterrupt:
        server.socket.close()

if __name__ == '__main__':
    main()

I created this simple script and put it in my ~/bin/ directory, also making it executable. Once that's done, you can navigate to your mydox/docs/ directory and run the script. Since I called the script webserver.py, I just do this:

$ cd ~/mydox/docs
$ sudo webserver.py

This makes it possible for you to visit http://localhost/ on your own computer, or to use your computer's IP in place of localhost to access your documentation from a different computer on your network. Pretty slick, if you ask me.

I suppose there's more I could add, but that's all I have time for tonight. Enjoy!

Automatic Config Replication With Mercurial

Josh VanderLinden

2009-10-31 00:22

Comments

I've done a lot of neat things since I started my new job earlier this month. I'm really excited about the things I've learned and experimented with, and I would like to share some of the concepts with my visitors.

At work we use a lot of virtual machines in our individual development environments. Most of these virtual machines use very similar configuration settings, but the settings are not a standard part of the installation. That is because we build our virtual machines using the same installation tools that our customers would use. The configuration I'm talking about is just stuff specific to our development environment.

Creating and configuring these virtual machines is one of the first things my mentor showed me how to do my first day on the job. He commented on how quickly I would probably start learning all of the configuration tasks because we tend to setup our development VMs several times a month. That was all fine and dandy, and I did get a pretty good feel for what needed to go into a development VM that first day.

However, after doing it so many times, I realized how much time I was using just trying to get the VM set up just right. It wasn't hard to configure--it was just time-consuming. It wasn't long before I started thinking of ways to optimize the process.

One of the ideas I came up with, which seems to be serving my purposes perfectly, is that of using Mercurial to quickly and easily get the exact same configuration from one box to another. It also has the added benefit of keeping a history of the changes I make to my configuration as time goes on.

I won't go into exact detail on how I have things setup at work, but I would like to try to describe a similar scenario that should illustrate my goal just as well.

Getting Started

One of the first things I would encourage you to do is follow along. It will make the concept sink in much faster, and you will probably see other applications very quickly. Please note, however, that if you're following along exactly, it could be a very time-consuming process. I will be using 3 virtual machines as I write this, but you could just as easily use 5, 10, or 100,000. Likewise, you could eliminate the virtual machines altogether if you're in an environment with several physical computers.

One virtual machine will act as the "master" server, or the one that will be configured first. The other virtual machines will act as "slave" servers, which will simply receive configuration updates that happen on the master server. We will also modify this behavior to be a bit more interesting toward the end of the article.

Virtual Machines Galore!

First off, I will create some basic virtual machines using the net install version of Debian 5.0.3. I really only need to create 1 VM and then clone it a couple of times. I am willing to furnish my virtual machines to those who are interested in using them. I will install some additional software in the VM to make sure the demo works smoothly. Among the packages that I will install are:

Python
Mercurial
OpenSSH server

Initialize a Repository

Once I have all of that set up in my virtual machines, I will initialize a Mercurial repository on the master server to maintain the configuration files that I am interested in. Let's just use the /etc directory for the time being. There's a pretty good chance that most of our system-wide configuration will all be contained somewhere beneath /etc.

cd /etc
hg init

Now let's have a gander at the files that we can have Mercurial manage for us:

hg st

Wow! That is quite a set of files, isn't it? Thankfully, they should mostly be plain text files. Mercurial is very efficient at managing text files. Let's now add all of the files in /etc to our repository, so they can be tracked and easily pushed out to other systems.

hg add

That command will happily add everything that hg st printed. Obviously, we can get a little more picky about what we do and do not add to our repository, but that's not the goal of this article. Now, this step merely tells Mercurial that it needs to pay attention to changes in these files. The files have not yet been committed to the repo. Let's do that, so we have a backup of our configuration files in their pristine state:

hg ci -m "Initial import"

The -m "Initial import" is just a comment, to describe what happened to warrant a commit to the repository. It is for your use and the use of anyone who has access to your repo.

Clone The Configuration

Now let's try to push the configuration we just committed on the master server to one of the slave servers. Since my virtual machines are all essentially in the same state, there should be no conflicts, right? Try running the following command on the master server:

hg push ssh://root@slave1//etc
root@slave1's password:
remote: abort: There is no Mercurial repository here (.hg not found)!
abort: no suitable response from remote hg!

Blast! We can't simply push the configuration files out to another computer. For that to work, we'd first have to have the repository itself exist on the slave server. Let's try this another way. One the slave server, run this command:

hg clone ssh://root@master//etc /etc
root@master's password:
abort: destination '/etc/' is not empty

Doh! Mercurial won't let us clone the repository from the master server! That's because Mercurial wants to clone to a new directory, with nothing already in it. One way to get around this hairball of a show-stopper is to just copy the repo using conventional UNIX utilities. Execute this command on one of your slave servers:

scp -r root@master:/etc/.hg /etc/

The .hg directory contains all of the repository information, and it's really all we need to snag in order to clone the repository. This might not be the most elegant solution in the world, but it will suffice for the time being. Once the scp command completes, we should have a full copy of the configuration file repository. Run this command to verify:

hg st

If your setup is anything like mine, you'll probably have a few files that are listed as being modified. Chances are that these files will vary from host to host anyway, and they are probably not worth keeping in a version control system. That would just be begging for conflicts.

I wrote an extension for Mercurial that should make this part of my tutorial a little less hacky. On your other slave server, run the following commands:

hg clone http://bitbucket.org/codekoala/hgext /root/hgext
echo "[extensions]" >> /root/.hgrc
echo "neclone = /root/hgext/neclone.py" >> /root/.hgrc

This extension gives you a new Mercurial command called neclone (N. E. Clone, or "not empty clone"). As we saw earlier, Mercurial doesn't let us clone a repository into a directory that is not empty. This extension allows us to do that. It works almost identically to the regular clone command... takes the same options and everything.

Still on your second slave server, run these additional commands:

hg neclone ssh://root@master//etc /etc
cd /etc
hg up -C

The last step is optional, and soon to be included as part of the extension. It will update your working copy to the latest revision in the repository. Beware that it overwrites any uncommitted changes you may have made to files that are tracked by Mercurial.

So now both slave servers should have a clone of the configuration repository from the master server.

Being Picky

Let's start to be a little picky about the files we are tracking in our repository. Some of the files appears as being modified on my slave server after copying the .hg directory from the master server are:

adjtime
alternatives/pager
alternatives/pager.1.gz
mailcap
network/run/ifstate
udev/rules.d/70-persistent-net.rules

I think it's safe to remove these from the repository, to avoid conflicts with other systems. To tell Mercurial to stop tracking files it is tracking, without actually deleting the file from the filesystem, you can use the following command:

hg forget adjtime
hg forget mailcap

And so on. Go ahead and do that for each of the files that appeared to be modified on your slave server immediately after copying the .hg directory. I'm going to add /etc/hostname to the list of files to forget too.

After doing that, each of those files should appear as being marked for removal when you run hg st. Don't worry, this is normal. The files will not be deleted from the filesystem, but they will be deleted from the repository. Go ahead and commit those changes to the repository on your slave server.

hg ci -Am "Removed some files from version control"

Now let's push those changes out to the master server:

hg push
abort: repository default-push not found!

Since we copied the .hg directory directly using scp, our slave won't know where the changes need to go when we run the push command with no explicit destination repository. To fix that, let's create a file in /etc/.hg/ called hgrc on the slave server. In that file, put the following text:

[paths]
default = ssh://root@master//etc

The hg push command should now push directly to the master server. Yay! The problem we face now is that every other slave server in the group is out of date. How can we fix that? We'll use Mercurial hooks.

Automating Config Replication

Mercurial offers some very useful hooks that we can use to automatically push configuration changes out to each of our slave servers. We will use the commit and changegroup hooks to do the magic. Let's create a script that will live on the master server to take care of pushing our changes out to each slave server. Create a new file in /etc/ on the master server called propagate.sh:

#!/bin/bash
hg up
for node in 'slave1' 'slave2'
do
    ssh root@$node "cd /etc; hg pull -u"
done

Let's also make sure this script is executable:

chmod +x /etc/propagate.sh

This script assumes that your /etc/hosts file or your nameserver are configured appropriately to allow slave1 and slave2 to be resolved to IP addresses. The reason we're SSH'ing into each slave server and using hg pull instead of simply using hg push ssh://root@$node//etc is because you can't force an update on a remote server using push. You can, however, request an update when you're using pull.

Obviously, this script is not the most sophisticated of scripts. It might work well for my demonstration, with only a few servers, but once you get beyond that it would be a nightmare to maintain the list of servers the script has to connect to. You can use whatever means you'd like to keep track of the servers you want to replicate your configuration to. I don't want to bother with all of the crap I'd get for suggesting one thing over another, so it's now your call.

Now it's time to configure the Mercurial hook to execute that script when the master server sees a changeset get into its repository. Open up /etc/.hg/hgrc on the master server, or create it if it doesn't exist. Make sure it has at least the following in it:

[hooks]
commit.propagate = /etc/propagate.sh
changegroup.propagate = /etc/propagate.sh

Let's try it out! Run these commands on your master server:

echo "" >> /etc/hosts
hg ci -m "Added a blank line to the hosts file"
root@slave1's password:
remote: Permission denied, please try again.
remote: Permission denied, please try again.
remote: Permission denied (publickey,password).
abort: no suitable response from remote hg!
Connection closed by slave2
warning: commit.propagate hook exited with status 255

Blast! The script failed because it wanted us to type in a password, but it was not in interactive mode. Let's fix that with a little preshared key magic. I won't go into the details about how this works, but the following commands on your master server should get us rolling:

ssh-keygen
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2
scp -r ~/.ssh root@slave1:~
scp -r ~/.ssh root@slave2:~

Warning

Keep in mind this is not secure and should probably not be how your production machines are configured, especially with the root user.

For simplicity's sake, just accept all of the details and don't set a passphrase. These commands enable us to SSH into our slave servers without using a password. If you get an error such as:

remote: Host key verification failed.
abort: no suitable response from remote hg!

...it just means you need to manually log into your master server from the slave machine that threw that error. When doing so, you will have to answer "yes" to a question about the authenticity of the host you're logging into.

Testing It Out

It is now time to see if we can make a configuration change on one slave server and have it show up on the other slave server. Let's update the hosts file a little bit. Let's add the following line on the second slave server:

10.0.0.5        nonexistanthost

Now let's commit the change and push it off to the master server:

hg ci -m "Added a dumb line to the hosts file"
hg push

My system actually told me that that it had copied the change out to another host. I know because I saw these lines:

remote: pulling from ssh://root@master//etc
remote: searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Now when I look at the first slave server, I should see that new line in my /etc/hosts file. Also, the log on each server should have the same entry that I just made about adding "a dumb line to the hosts file."

Seem Like A Lot of Work?

A lot of what we just did probably seemed like more work that it is worth, right? Well, being a nerd typically comes with a few qualities. One quality which I have observed many a time in my most geeky of friends is that they will spend hours and hours up front on a program or script just so they can save 2 minutes in the future. They work hard to be lazy.

There is a lot of boilerplate configuration that takes place in this particular scenario. I realize that. What I haven't shared with you, though, is how I automated the boilerplate configuration as well as the propagation of configuration. I'm tired of putting this article off, so I will have to leave those details for another article. Sorry!

Why?! There's a Better Way (tm)

There is always a better way. Always. Go ahead and use whatever you feel is the most efficient method for keeping configuration files in sync across several computers. This is just one more option to add to your toolkit. Don't worry, I won't be offended if you don't like it or don't use it. It works perfect for me and it's free, and I just wanted to share!

My VIM Adventures

Josh VanderLinden

2009-06-18 03:45

Comments

Along with my recent adventures with Fedora 11, I decided to force myself to become more proficient with VIM. For those of you who do not know, VIM is based on perhaps one of the oldest surviving text editors around today. There are often religious-grade battles between those who believe in VIM and those who believe in Emacs, another long-surviving text editor. I'm not trying to get into any debates about which is better, and I'm not interested in why I should not be using VIM. If you still feel like I need to be set straight, please use the contact me form instead of the comments section.

Anyway, most people who use these editors fall into 1 of 3 categories (there are probably more categories actually):

They're familiar with it enough to get the job done, but they're not exactly proficient. Therefore, they don't care about evangelizing the editor.
They're proficient with the editor, but they're afraid of the politics involved in religious wars relating to text editors, so they don't evangelize.
They're proficient with the editor and feel that the whole world would be better off if everyone used their preferred text editor. As such, they cannot shut up about the dang thing and drive all of their friends, coworkers, and acquaintances mad.

A few of you will probably agree with what I'm about to say. I fear I have transitioned from stage 1 to stage 3 fairly rapidly. I can't stop talking about VIM all of the sudden! You'd think it's the next best thing after sliced bread the way I've been blabbering about it. And here I am, writing an article about it. Hah.

Ever since I first started using Linux, I have been using vi to handle most of my text editing when I was in a terminal. I knew enough to get around. Basic things like navigation and inserting text were pretty much all I knew how to do. I dabbled with a tutorial here and there, but it wasn't long before the things I learned were lost, since I usually preferred a graphical text editor over VIM.

My recent experimentation with VIM has proved to be very fruitful, if I do say so myself. I am no longer tied down to some editor that is slow and bulky, I don't have much to worry about when I switch computers (chances are that VIM is on any computer I use regularly), and I don't even need to be sitting at the computer I'm using VIM on! In fact, today I was doing most of my work over an SSH session to my netbook. I felt more productive today than I have in a very long time.

It's been a long time since I've enjoyed using a mouse to perform basic tasks on my computer. Using VIM allows me to rid myself of the mouse entirely for my text editing tasks, and I don't feel at all limited in my capabilities. Things that used to be quite sketchy operations using my favorite graphical editors end up being very simple with VIM.

I also love the obscurity favor of it all.

Examples

I wish I could just keep adding stuff to this list! There are so many neat things I want to share with everyone about VIM! I'm sure there are more efficient ways to do some of the things I have been learning with VIM, but this works very well for me.

Laziness

I do a lot of reStructuredText for various things. In fact, I'm writing this article using VIM right now. ReST is fantastic, but it's horrible to do using an editor that is not set up with a mono spaced font. I like to see things nicely lined up (I'm a Python developer, after all). I also like to have my section headings have an underline that is as long as the heading itself. For example, the heading just above this looks like this:

Examples
========

In this particular instance, it's not a big deal to hold down the equals key long enough to underline the word "Examples". However, sometimes I get some pretty lengthy section titles. The lazy side of me doesn't want my finger to hang around on the same key for very long (or tap it dozens of times, for that matter). Also, trying to figure out how many characters are in a section title without a mono spaced font is very annoying.

The/a solution? Say I have a section heading that is 50 characters long. To underline it, all I have to do is type 50i= and hit the escape key.

Cutting Text Mid-Line

Another neat thing is being able to cut text from the cursor to a particular character somewhere later on (or earlier on!) in the same line. Say I have a hyperlink whose address I wish to change:

<a href="http://www.somelong.com/that/I/want/to/change/">Link Text</a>

Instead of using the mouse to highlight the href attribute's value (or highlight it using shift on the keyboard), I just position my cursor on the h in http and type dt". VIM will lop that address right out of there (and you can paste it elsewhere if you'd like). I used this particular shortcut countless times today as I replaced things like {% url some-named-url with,some,parameters %} with {{ some_object.get_absolute_url }} in some Django templates.

Search & Replace

And I cannot neglect the classic search and replace functionality in VIM. You can use fancy regular expressions in VIM to replace some text with something else. I was trying to do a little refactoring today, and I came up with a command like this:

:s/something/lambda (a,b,c): \0(a,b,c)/g

That sort of command works great to replace all occurrences of "something" on the current line with "lambda (a,b,c): something(a,b,c)". Fantastic. What about a global search and replace, instead of just the current line? Stash a % at the front of the command (:%s/something/lambda (a,b,c): \0(a,b,c)/g) and you're in business.

Now what if you only wanted to perform that search and replace over a certain group of lines instead of a single line or the whole file? This is one I'm particularly thrilled about:

:.,.+9 s/something/lambda (a,b,c): \0(a,b,c)/g

That little beauty will perform the search and replace on the current line and the following 9 lines. How awesome is that?

Moving & Deleting Words

Sometimes as I am writing something, I decide I would like to reword a sentence as I near the end. Sometimes this involves simply deleting a word or two. Sometimes it means chopping a few words out of the beginning part of a sentence to put them back at the end somewhere. Whatever the case, VIM seems to handle my needs perfectly well.

Say I have this sentence (from the Vimperator Web site): "Writing efficient user interfaces is the main maxim, here at Vimperator labs." If I want to move the "here at Vimperator labs" to the beginning of the sentence, assuming I just finished typing it, I would place my cursor over the period at the end, type dT,, hit ( to go to the beginning of the sentence, hit P to insert what I just copied, and then handle the rest of the clean up (capitalization, fixing the comma, etc). I could have also done something like, 4db instead of dT,.

If I want to cut/delete an entire word, or to the end of whatever word my cursor is currently on, I could use dw. For more than one word, just put a number before the command. It's great stuff!

Taking It Too Far

I've gotten so carried away with all of this VIM business. I really have. I installed vimperator in Firefox. This extension gives Firefox a VIM-like interface. Now I can do pretty much all of my regular surfing without using the mouse. Some may argue that this is absolutely impractical because it would take much longer to get to the right link on a page using the keyboard than it would with the mouse. That may be true. I dunno, but I still think it's awesome that I really don't need my mouse to browse the Internet now.

As I was playing with vimperator tonight, one of my buddies pointed out another useful extension called It's All Text. This extension allows you to use your preferred text editing program in regular old text boxes in Firefox. It is this extension which has just made writing my blog articles 200x more efficient. Now I can quickly and easily write my articles right here in VIM without having to copy and paste all over the place. Pretty dang incredible.

Oh yes, I'd like to thank Chad Hansen and Jonathan Geddes for helping me out as I explore the depths of VIM. You guys rock!

My Fedora 11 Adventures: Part IV

Josh VanderLinden

2009-06-11 19:50

Comments

Well, here I am in 64-bit Fedora 11. Only this time I'm using GNOME. I followed pretty much the same steps as before to make my computer conform to my preferences (background image, font sizes, etc). Things seem to be rolling much more smoothly with GNOME than they did with KDE 4. To top it all off, I haven't had to do a hard reset of my machine since I noticed that Firefox was the culprit that always seemed to be loading when my GUI became unresponsive.

Wireless Networking

I found a pretty good, straight-forward tutorial for getting my wireless adapter running in Fedora 11. You can find it here. I used b43-fwcutter to get hooked up. Very easy, but not quite as easy as recent versions of Ubuntu and several other distros (which detected and used my wireless adapter automatically).

Installing Software

I have learned that the Software Management tool I was trying to use before simply is not the way to install software in Fedora. It doesn't work worth beans for me. Instead, I've been dropping to a root terminal whenever I need to install something. The yum utility has been treating me pretty well. It's found every package I look for.

That is, until I tried to install vlc. When I went to the VLC website to grab a Fedora package, I learned about something that I've never heard of before: RPM Fusion.

I guess it is just another software repository with all of the goods that aren't in the stock Fedora repositories. VLC was easily installed after following this guide.

I really need to get some work done, so I will have to wait a little while longer before I post any further information about my adventures.

Giving OpenSUSE 11.1 An Honest Chance

Josh VanderLinden

2008-12-31 16:33

Comments

I've decided that if I ever want to really understand Linux, I'll have to give as many distributions as possible a chance. In the past, I've tried to use OpenSUSE on my HP Pavilion dv8000 laptop, but it never seemed quite as robust or useful as many other distributions that I've tried on the same machine.

With the recent release of OpenSUSE 11.1, I downloaded the final 32-bit DVD ISO as I normally do for newly released distributions (even if I don't plan on using them--it's an addiction). I proceeded to install the GNOME version of it in a virtual machine to see what all the hubbub was about. Evaluating an operating system within a virtual machine is not the most effective way to do things, but everything seemed fairly solid. As such, and since I have always had difficulties keeping any RPM-based distro around for any length of time, I plan on using OpenSUSE 11.1 through March 2008 (perhaps longer if it grows on me). If it hoses my system, I will go back to something better. If it works, I will learn to use and appreciate it better.

The Installation

The first step when the installation program starts is to choose what language to use, after which you choose the type of installation you're going to be doing. Your choices are:

New Installation
Update
Repair Installed System

You also have the option of installing "Add-On Products" from another media. At this step, I chose to do a new installation.

Next, you get to choose your time zone. The interface is very intuitive. You get a map of the world, and you click on the region you want to zoom in on. Once you're zoomed in, you can select a city that is near you to specify your time zone. Alternatively, you can choose your region and time zone from a couple of drop down lists.

After setting your time zone, you get to choose which desktop environment you want to install. Your choices are:

GNOME 2.24.1
KDE 4.1.3
KDE 3.5.10
XFCE 4.4
Minimal X Window
Minimal Server Selection (Text Mode)

I will choose to install GNOME because it seems to be the desktop of the future, especially with the hideous beast that KDE has become in the 4.x series...

Now you get to play with the partitioning. Usually the installer's first guess is pretty good, but I've got a different arrangement for my partitions, so I'm going to customize things a bit.

The next step is to create a regular, unprivileged user account for your day-to-day computing needs. This screen is pretty self-explanatory if you've ever registered for an e-mail address or installed any other operating system.

One thing that seems to have been added to OpenSUSE 11.1 is the option to use your regular user password as the root password. This is probably a nice addition for a lot of people, but I'd rather feel like my computer is a little more secure by having a different password for administrative tasks.

You're also give a few other options, such as being able to receive system mail, logging in automatically, and modifying your authentication settings. Other than the administrative password option, I left everything the same. If you're like me, and choose to have a different administrative password, you will be prompted to enter the new password at the next step.

Finally, you're shown a summary of the installation tasks that will take place. I'm going to customize my software selection just a bit so I don't have to do it manually after the installation is complete. For example, while I do like GNOME to a degree, I prefer to use KDE 3.5.x, so I will choose to install that environment as well just in case I need the comfort of KDE programs. Also, since I like to use the command line interface for a lot of things, I will choose to install the "Console Tools" package, just because it sounds useful. Lastly, I will choose to install a few development packages, such as C/C++, Java, Python, and Tcl/Tk. These changes bumped up my installation size from about 2.8GB to just over 4GB.

After reviewing the remaining tasks, all you need to do is hit the "Install" button. You will be prompted to verify your desire to install OpenSUSE, after which the package installation will begin. While the installation is taking place, you have the option of watching a brain-washing slideshow, viewing the installation details as it progresses, or reading the release notes.

The actual installation took nearly 40 minutes on my laptop. While this isn't necessarily a great improvement over past releases, I'm sure the story would have been much different had I not customized the software I wanted to have installed. The introduction of installation images a few releases ago drastically improved installation times. If you don't customize your package selection, you'll probably notice the speed difference.

When all of the packages have been installed, the installation program begins to configure your newly installed OpenSUSE for your computer, with a "reboot" in between. This is when all of your hardware, such as your network adapters, graphics adapter, sound card, printers, etc are probed and configured. Strangely enough, this step seems to take a lot longer than it does in Windows, which is usually not the case with Linux. What is OpenSUSE up to I wonder?

When all is said and done, the installation program finishes on its own and loads up your desktop.

Annoyances

There are a couple things that really annoyed me right off the bat about OpenSUSE 11.1. The first was that the loading screen and installation program didn't use my laptop's native resolution. My screen is capable of 1680x1050. The installation program chopped off about 1.25 inches of screen real estate on either side of the program. I don't know if this was an intentional occurrence or not. It seems like the artwork in the installation may have been limited to a non-widescreen resolution. If so, that's completely retarded. I'd like to think that more computer users these days have a widescreen monitor than not, at least the ones who would be playing with Linux.

The second annoyance was that the installation program wouldn't use my external USB DVD drive, which I like to think more reliable than my internal DVD drive. I mean, everything would start up fine--I got the boot menu, the installation program loaded fine, and things seemed like they would work. That's up until the package repositories (the DVD) were being built. Then the USB drive just kept spinning and spinning. Once I popped the disc into my internal drive the program proceeded as expected.

Your Desktop

I thought it was interesting that I chose to install GNOME, but since I chose to install KDE 3.5.10 alongside it that's what it booted me into after the installation was completed. No real complaints, though, since I prefer KDE anyway. Nonetheless, I switched back to GNOME to stretch my limits all the more. At least the desktop took up the full resolution that my screen can handle, unlike the installation program and boot screen.

Things seem fairly responsive... nothing like Slackware though. I just received a little popup notification with an excuse for the lag I might be experiencing: the daily indexing has commenced and should be finished soon. Whatever it's up to, it's taking up a consistent 100% of my CPU. How nice. I hope whatever it's indexing ends up being useful.

Sound worked right from the get-go, which is nice. Hardware acceleration for my Radeon Xpress 200M doesn't work, nor does my Broadcom wireless card. These will be fixed soon.

The Wireless

It looks like the most important step in getting my wireless to work was executing these commands as root:

/usr/sbin/install_bcm43xx_firmware
modprobe b43

I did a lot of stuff to try to get my wireless to work before I executed those commands, but nothing did the trick until I tried them. Also, to make the wireless available each time you reboot without requiring the modprobe b43 command, you need to edit your sysconfig.

To do that, open up YaST and find the "/etc/sysconfig Editor" option. Expand the "System" node, and navigate to Kernel > MODULES_LOADED_ON_BOOT. Then put b43 in the value box. Apply the changes. The next time you reboot your computer, the wireless should be available from the get-go.

The Video Card

This section only really applies to folks with ATI graphics adapters.

I found a tutorial on ubuntuforums.org, strangely enough, which described the process for getting ATI drivers to work on OpenSUSE 11.1. The first step is to download the official ATI drivers for Linux. Each of these commands should be executed as root:

wget https://a248.e.akamai.net/f/674/9206/0/www2.ati.com/drivers/\
linux/ati-driver-installer-8-12-x86.x86_64.run

Next, you need to download the kernel source and ensure that you have a few other utilities required for compiling a kernel module:

zypper in kernel-source gcc make patch

Now you should be able to run through the ATI driver installation utility, accepting all of the defaults:

sh ati-driver-installer-8-12-x86.x86_64.run

If you're on 64-bit OpenSUSE, you need to take an extra step to make the driver available:

rm /usr/lib/dri/fglrx_dri.so && ln -s /usr/lib64/dri/fglrx_dri.so \
/usr/lib/dri/fglrx_dri.so

Backup your existing xorg.conf configuration file and configure Xorg to use the new driver:

cp /etc/X11/xorg.conf /etc/X11/xorg.conf.orig
aticonfig --initial -f

Finally, configure Sax2 with the ATI driver:

sax2 -r -m 0=fglrx

Upon rebooting your computer, you should be able to use the hardware-accelerated 3D capabilities of your ATI card. To verify that things are up and running, execute fglrxinfo as a normal user. This command renders the following output on my system:

display: :0.0  screen: 0
OpenGL vendor string: ATI Technologies Inc.
OpenGL renderer string: ATI Radeon Xpress Series
OpenGL version string: 2.1.8304 Release

Other Thoughts

After having played with OpenSUSE 11.1 for a couple hours, I think I might be able to keep it around for a little while. Despite the lack of speed exhibited by other Linux distributions, the "stability" that OpenSUSE seems to offer is attractive to me. It will likely take some time to get used to RPMs over DEBs for package management.

How bad can it be? I mean, it comes with OpenOffice 3.0.0, which is nice. It can handle dual-head mode on my laptop thanks to Xinerama, which no other distro to date has been able to do. This gives me a little more screen real estate to work with, which helps out a lot when I'm developing a Web site or working in an IDE. The package managers are slow, but how often do you really install software anyway?

Again, we'll just have to see how things pan out. Let's hope it turns out to be a positive experience.

Using Django to Design Your Database Schema

Josh VanderLinden

2008-08-13 12:57

Comments

Last night I had a buddy of mine ask me how I would approach a particular database design problem. I get similar questions quite often from my peers--suggests there is something important lacking from the database classes out there. Instead of answering him directly, I decided to come up with this tutorial for using Django to design your database schema.

For those of you new to Django, this article might seem a bit advanced. In time I will have more introductory-level Django tutorials, but I hope this one is easy enough.

Create a Django Project

The first step is to create a Django project. If you already have a project that you can play with, you can skip this step. To create a project, go to a place where you want to keep your code (like C:\projects or /home/me/projects) in a command prompt/terminal and run the following command:

django-admin.py startproject myproject

This will create a new directory in your current location called myproject (you can replace myproject with whatever you'd like so long as you're consistent). This new directory will contain a few files:

__init__.py
manage.py
settings.py
urls.py

If you get an error message when running the above command, you might not have Django installed properly. See Step-by-Step: Installing Django for details on installing Django.

Create An Application

Once you have a Django project setup, you should create a new application.

Note: If you're using Windows, you will probably need to omit the ./ on the ./manage.py commands. I will include them here for everyone else who's using Linux or a Mac.

cd myproject
./manage.py startapp specialapp

This will create a new directory in your myproject directory. This new directory will contain three files: __init__.py, models.py, and views.py. We are only concerned with the models.py file in this article.

Create Your Models

Models are usually a direct representation of what your database will be. Django makes creating these models extremely easy, and Python's syntax makes them quite readable. The Django framework asks for models to be defined in the models.py file that was created in the last step. Here's an example (for my buddy who prompted the creation of this article):

from django.db import models

class Component(models.Model):
    item_number = models.CharField(max_length=20)
    name = models.CharField(max_length=50)
    size = models.CharField(max_length=10)
    quantity = models.IntegerField(default=1)
    price = models.DecimalField(max_digits=8, decimal_places=2)

class Project(models.Model):
    name = models.CharField(max_length=50)
    components = models.ManyToManyField(Component)
    instructions = models.TextField()

(for more information about models, see the Django Model API Reference)

I don't know about you, but that code seems pretty straightforward to me. I'll spare you all the details about what's going on (that can be a future article).

Install Your New Application

Once you have your models setup, we need to add our specialapp to our list of INSTALLED_APPS in order for Django to register these models. To do that, open up settings.py in your myproject directory, go to the bottom of the file, until you see something like

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
)

When you find that, add your specialapp to the list

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'specialapp'
)

Setup Your Database

Now you need to let Django know what kind of database you're using. Django currently supports MySQL, SQLite3, PostgreSQL, and Oracle natively, but you can get third-party tools that allow you to use other database (like SQL Server).

Still in your settings.py, go to the top until you see DATABASE_ENGINE and DATABASE_NAME. Set that to whatever type of database you are using:

1 2	DATABASE_ENGINE = 'sqlite3' DATABASE_NAME = 'myproject.db'

Save your settings.py and go back to your command prompt/terminal.

Get Django's Opinion For Your Schema

Make sure you're in your myproject directory and run the following command:

./manage.py sqlall specialapp

This command will examine the models that we created previously and will generate the appropriate SQL to create the tables for your particular database. For SQLite, we get something like this for output:

BEGIN;
CREATE TABLE "specialapp_component" (
      "id" integer NOT NULL PRIMARY KEY,
      "item_number" varchar(20) NOT NULL,
      "name" varchar(50) NOT NULL,
      "size" varchar(10) NOT NULL,
      "quantity" integer NOT NULL,
      "price" decimal NOT NULL
)
;
CREATE TABLE "specialapp_project" (
      "id" integer NOT NULL PRIMARY KEY,
      "name" varchar(50) NOT NULL,
      "instructions" text NOT NULL
)
;
CREATE TABLE "specialapp_project_components" (
      "id" integer NOT NULL PRIMARY KEY,
      "project_id" integer NOT NULL REFERENCES "specialapp_project" ("id"),
      "component_id" integer NOT NULL REFERENCES "specialapp_component" ("id"),
      UNIQUE ("project_id", "component_id")
)
;
COMMIT;

Notice how Django does all sorts of nifty things, like wrapping the table creation queries in a transaction, setting up indexes, unique keys, and defining relationships between tables. The output also offers a solution to the original problem my buddy had: an intermediate table that just keeps track of relationships between projects and components (the specialapp_project_components table).

Notice that the SQL above may not work with database servers other than SQLite.

Enhancing The Intermediate Table

After my buddy reviewed this article, he asked a very interesting and valid question: What if a project needs 3 of one component? In response, I offer the following models (this requires a modern version of Django--it doesn't work on Django 0.96.1 or earlier):

from django.db import models

class Component(models.Model):
    item_number = models.CharField(max_length=20)
    name = models.CharField(max_length=50)
    size = models.CharField(max_length=10)
    quantity = models.IntegerField(default=1)
    price = models.DecimalField(max_digits=8, decimal_places=2)

class Project(models.Model):
    name = models.CharField(max_length=50)
    components = models.ManyToManyField(Component, through='ProjectComponent')
    instructions = models.TextField()

class ProjectComponent(models.Model):
    project = models.ForeignKey(Project)
    component = models.ForeignKey(Component)
    quantity = models.PositiveIntegerField()

    class Meta:
        unique_together = ['project', 'component']

Running ./manage.py sqlall specialapp now generates the following SQL:

BEGIN;
CREATE TABLE "specialapp_component" (
    "id" integer NOT NULL PRIMARY KEY,
    "item_number" varchar(20) NOT NULL,
    "name" varchar(50) NOT NULL,
    "size" varchar(10) NOT NULL,
    "quantity" integer NOT NULL,
    "price" decimal NOT NULL
)
;
CREATE TABLE "specialapp_project" (
    "id" integer NOT NULL PRIMARY KEY,
    "name" varchar(50) NOT NULL,
    "instructions" text NOT NULL
)
;
CREATE TABLE "specialapp_projectcomponent" (
    "id" integer NOT NULL PRIMARY KEY,
    "project_id" integer NOT NULL REFERENCES "specialapp_project" ("id"),
    "component_id" integer NOT NULL REFERENCES "specialapp_component" ("id"),
    "quantity" integer unsigned NOT NULL,
    UNIQUE ("project_id", "component_id")
)
;
CREATE INDEX "specialapp_projectcomponent_project_id" ON "specialapp_projectcomponent" ("project_id");
CREATE INDEX "specialapp_projectcomponent_component_id" ON "specialapp_projectcomponent" ("component_id");
COMMIT;

As you can see, most of the SQL is the same. The main difference is that the specialapp_project_components table has become specialapp_projectcomponent and it now has a quantity column. This can be used to keep track of the quantity of each component that a project requires. You can add however many fields you want to this new intermediate table's model.

Using This SQL

There are several ways you can use the SQL generated by Django. If you want to make your life really easy, you can have Django create the tables for you directly. Assuming that you have specified all of the appropriate database information in your settings.py file, you can simply run the following command:

./manage.py syncdb

This will execute the queries generated earlier directly on your database, creating the tables (if they don't already exist). Please note that this command currently will not update your schema if the table exists but is missing a column or two. You must either do that manually or drop the table in question and then execute the syncdb command.

Another option, if you want to keep your DDL(Data Definition Language) in a separate script (maybe if you want to keep it in some sort of version control) is something like:

./manage.py sqlall specialapp > specialapp-ddl-080813.sql

This just puts the output of the sqlall command into a file called specialapp-ddl-080813.sql for later use.

Benefits of Using Django To Create Your Schema

Simple: I personally find the syntax of Django models to be very simple and direct. There is a comprehensive API that explains and demonstrates what Django models are capable of.
Fast: Being that the syntax is so simple, I find that it makes designing and defining your schema much faster than trying to do it with raw SQL or using a database administration GUI.
Understandable: Looking at the model code in Django is not nearly as intimidating as similar solutions in other frameworks (think about Java Persistence API models).
Intelligent: Using the same model code, Django can generate proper Data Definition Language SQL for several popular database servers. It handles indexes, keys, relationships, transactions, etc. and can tell the difference between server types.

Downfalls of Using Django To Create Your Schema

The Table Prefix: Notice how all of the tables in the SQL above were prefixed with specialapp_. That's Django's safe way of making sure models from different applications in the same Django project do not interfere with each other. However, if you don't plan on using Django for your end project, the prefix could be a major annoyance. There are a couple solutions:
- A simple "search and replace" before executing the SQL in your database
- Define the db_table option in your models
Another Technology: Django (or even Python) may or may not be in your organization's current development stack. If it's not, using the methods described in this article would just become one more thing to support.

Other Thoughts

I first thought about doing the things mentioned in this application when I was working on a personal Java application. I like to use JPA when developing database-backed applications in Java because it abstracts away a lot of the database operations. However, I don't like coming up with the model classes directly, so I usually reverse engineer them from existing database tables.

Before thinking about the things discussed in this article, I created the tables by hand, making several modifications to the schema before I was satisfied with my JPA models. This proved to be quite bothersome and time-consuming.

After using Django to develop my tables, the JPA models turned out to be a lot more reliable, and they were usually designed properly from the get-go. I haven't created tables manually ever since.

If you find yourself designing database schemas often, and you find that you have to make several changes to your tables before you/the project requirements are satisfied, you might consider using Django to do the grunt work. It's worked for me, and I'm sure it will work for you too.

Good luck!