Monitor Multiple Remote Files Using Multitail

There comes a time in each of our individual lives that we just learn to love log files. We learn to love utilities like tail and grep as we pore over countless lines of information, seeking out the stuff that really matters. We like to show off our debugging prowess as innocent bystanders look on in absolute wonderment.

While that's all fine and dandy, I'm always on the lookout for utilities to make my log monitoring less painful. A few weeks ago, my supervisor introduced me to a program that he's been using for quite some time: multitail. In essence, it's tail with some really neat features, such as the ability to:

  • "tail" multiple files (or commands, like netstat) independently in the same terminal
  • highlight text using regular expressions
  • search log messages and see only the matching lines
  • merge multiple files into one log window
  • scrolling back in the history of a log file
  • highlighting "themes"

I've been using multitail for a couple of weeks now (it took me a while to warm up to it after my supervisor introduce it), and I'm quite satisfied with it. One thing I really, really like about multitail is that I can kinda sorta almost monitor multiple remote files. What does that mean, you ask?

Well, my development environment includes at least 5 virtual machines, each of which will be logging different but equally important information. I want to be able to "tail" a specific log file on each of the virtual machines in one window. Now, it took me a while to learn how to do this, which is why I'm sharing the information with you.

And here comes my usual disclaimer: this may not be the most efficient way to do what I want to do, but it's currently working for me. I'm open to other solutions too!

Anyway, I can run a command like the following to monitor multiple remote log files:

multitail -l 'ssh user@host1 "tail -f /path/to/log/file"' -l 'ssh user@host2 "tail -f /path/to/log/file"'

Such a command would ssh into two computers, host1 and host2, and run tail -f /path/to/log/file on each. Multitail allows you to monitor the output of both tail commands in a single window, reducing clutter on your desktop. You can also arrange the files/commands you're "tailing" into various rows and columns. I tend to have a 2x2 grid of log files when I use multitail at work.

I've also started using multitail to monitor the access and error logs for my Django sites on WebFaction. I simply ssh into my account, run an alias for a ridiculous multitail command, and watch as both log files scroll on by.

Again, this is just another aspect of my work environment that is fun and useful to me, and I wanted to spread the joy. Multitail may or may not be a utility you like to use, but it suits my current needs and desires quite well. YMMV. And, once again, I'm always on the look-out for other tools to make my work life more interesting and productive!

Afloat: Window Management For OSX

Today I wanted to be able to watch the PyCon Live Stream while using OS X at work. A quick Google search returned an awesome program: Afloat. It lets me change the window settings for just about any OS X application. Right now I've got the live feed just lingering there in the background, pinned to my desktop. Loving it!

Network Manager, Cisco VPN, And Internet

Those of us out on the eastern side of the United States are currently experiencing quite a snow storm. While this sort of storm would probably have not even made the local news in Rexburg (where my wife and I attended university), everyone is making a big deal about it around here. Part of that big deal included the option, and even recommendation, that we work from home on Friday, using the company VPN to take care of our tasks.

I was pretty excited at the idea of working from home once again (my last job was almost exclusively a work-at-home gig), so I made sure I was able to connect to the VPN a few days ago, after receiving the credentials. It took a few tries to get everything right in Windows, but eventually it started working quite well. Then I tried connecting from Linux, using the awesomeness known as Network Manager.

Since I'm currently on Fedora 12, all I had to do was make sure that I had network-manager-vpnc installed, and I could then configure a connection using the same credentials I used in Windows. I had a successful connection on the very first try, and it was working fabulously. I had access to all of my development machines and all of the tools I use on a daily basis.

It didn't take long, however, for me to notice a big problem: no Internet access. I could get to any machine I dang well pleased on the company network, but nothing on the Internet. Quite frustrating, to say the least.

I decided to leave the investigation as to why I had no Internet access and how to fix it for another night. Here I am now, tinkering with it again. I found out what I needed to change:

  • Right click on the Network Manager icon in the system tray, and select "Edit Connections..."
  • Click on the VPN tab
  • Edit your VPN connection
  • Click on the "IPv4 Settings" tab
  • Click the "Routes..." button
  • Make sure that the "Use this connection only for resources on its network" option is checked
  • Connect to your VPN, and enjoy access to the devices there as well as on the Internet!

Hopefully this saves someone else's sanity (Jeremy?)

How I Have A Mobile & Desktop Site With Django

Part of the latest version of my site included the deployment of a mobile-friendly site. Up until recently, I hadn't even attempted to create a mobile site because I thought it would take more time than it was worth. I wanted something beyond just using CSS to hide certain elements on the page. I wanted to be able to break down the content of my site into its most basic pieces and only include what was necessary. Also, I wanted to figure it out on my own (instead of reusing wheels other people had invented before me--horrible, I know).

With these requirements, I was afraid it would require more resources than I could spare on my shared Web host. My initial impression was that I would have to leverage the django.contrib.sites framework in a fashion that would essentially require two distinct instances of my site running in RAM. Despite these feelings, I decided to embark on a mission to create a mobile-friendly site while still offering a full desktop-friendly site. It was surprisingly simple. This may not be the best way to do it, but it sure works for me, and I'm very satisfied. So satisfied, in fact, that I am going to share my solution with all of my Django-loving friends.

The first step is to add a couple of new settings to your settings.py file:

import os
DIRNAME = os.path.abspath(os.path.dirname(__file__))

TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates'),
)

MOBILE_TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates', 'mobile'),
)
DESKTOP_TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates', 'desktop'),
)

For those of you not used to seeing that os.path.join stuff, it's just a (very efficient) way to make your Django project more portable between different computers and even operating systems. The new variables are MOBILE_TEMPLATE_DIRS and DESKTOP_TEMPLATE_DIRS, and their respective meanings should be fairly obvious. Basically, this tells Django that it can look for templates in your_django_project/templates, your_django_project/templates/mobile, and your_django_project/templates/desktop.

Next, we need to install a middleware that takes care of determining which directory Django should pay attention to when rendering pages, between mobile and desktop. You can put this into your_django_project/middleware.py:

from django.conf import settings

class MobileTemplatesMiddleware(object):
    """Determines which set of templates to use for a mobile site"""

    ORIG_TEMPLATE_DIRS = settings.TEMPLATE_DIRS

    def process_request(self, request):
        # sets are used here, you can use other logic if you have an older version of Python
        MOBILE_SUBDOMAINS = set(['m', 'mobile'])
        domain = set(request.META.get('HTTP_HOST', '').split('.'))

        if len(MOBILE_SUBDOMAINS & domain):
            settings.TEMPLATE_DIRS = settings.MOBILE_TEMPLATE_DIRS + self.ORIG_TEMPLATE_DIRS
        else:
            settings.TEMPLATE_DIRS = settings.DESKTOP_TEMPLATE_DIRS + self.ORIG_TEMPLATE_DIRS

Now you need to install the new middleware. Back in your settings.py, find the MIDDLEWARE_CLASSES variable, and insert a line like the following:

'your_django_project.middleware.MobileTemplatesMiddleware',

Finally, if you already have a base.html template in your your_django_project/templates directory, rename it to something else, such as site_base.html. Now create two new directories: your_django_project/templates/mobile and your_django_project/templates/desktop. In both of those directories, create a new base.html template that extends site_base.html.

Example site_base.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>{% block base_title %}Code Koala{% endblock %} - {% block title %}Welcome{% endblock %}</title>
<link href="{{ MEDIA_URL }}css/common.css" rel="stylesheet" type="text/css" media="screen" />
{% block extra-head %}{% endblock %}

</head>
<body>
<div id="page-wrapper">
    {% block header %}
    <div id="logo">
        <h1><a href="/">Code Koala</a></h1>
    </div>
    <div id="header">
        <div id="menu">
            <ul>
                <li><a href="/" class="first">Home</a></li>
                <li><a href="/blog/">Blog</a></li>
                <li><a href="/about/">About</a></li>
                <li><a href="/contact/">Contact</a></li>
            </ul>
        </div>
    </div>
    {% endblock %}
    <div id="page">
        <div id="content">
            {% block content %}{% endblock %}
        </div>
        <div id="sidebar">
            {% block sidebar %}
            Stuff
            {% endblock %}
        </div>
    </div>
    <div id="footer">
        {% block footer %}
        Footer stuff
        {% endblock %}
    </div>
</div>
</body>
</html>

Example desktop/base.html

{% extends 'site_base.html' %}

{% block extra-head %}
<!-- stylesheets -->
<link href="{{ MEDIA_URL }}css/desktop.css" rel="stylesheet" type="text/css" media="screen" />

<!-- JavaScripts -->
<script type="text/javascript" src="{{ MEDIA_URL }}js/jquery.js"></script>
{% endblock %}

Example mobile/base.html

{% extends 'site_base.html' %}

{% block extra-head %}
<!-- stylesheets -->
<link href="{{ MEDIA_URL }}css/mobile.css" rel="stylesheet" type="text/css" media="screen" />
{% endblock %}

{% block sidebar %}{% endblock %}

Please forgive me if the HTML or whatever is incorrect--I butchered the actual templates I use on Code Koala for the examples. There are some neat things you can do in your pages to make them more mobile friendly, such as including something like <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;" /> in your <head> tag. This is supposed to tell your visitor's browser to not scale your pages to make it all fit on the screen. You can find a lot of other such tips elsewhere on the interwebs, and I'm sure they'll be better explained elsewhere too. You can also find scripts to handle redirecting your visitors to a mobile site and whatnot. Google is your friend.

As for the Django side of things, that should be just about it. If you have other templates you want to customize based on the version of your site that visitors are viewing, simply add those templates to the your_django_project/templates/mobile or your_django_project/templates/desktop directories as necessary. For example, if you have an application called blog, and you want to override the entry_detail.html template for the mobile site, so it doesn't pull in a bunch of unnecessary information to save bandwidth, you could save your modified copy in your_django_project/templates/mobile/blog/entry_detail.html.

With this setup, all you have to do is point your main domain and a subdomain such as m.yourdomain.com to the same Django application, and the middleware will take care of the "heavy lifting". No need for an additional instance of your Django project just for the mobile site. No hackish hiding of elements using CSS. If you find this article useful and decide to use these techniques on your site, please let me know how it works in your environment and if you ran into any snags so I can update the information!

Site-Wide Caching in Django

My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

  • django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
  • django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

  • CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
  • CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.

Syndication Caching in Django

I've recently been working on some performance enhancements on my site. Apparently some of my latest articles are a little too popular for my shared hosting plan. The surge of traffic to my site took down several sites on the same server as my own.

My response to the fiasco was to, among other things, implement caching on my site. It seems like the caching has helped a lot. I've noticed that my RSS feeds are hit almost as hard as real articles on my site, and I noticed that they weren't being cached the way I had expected. I tried a couple of things that I thought would work, but nothing seemed to do the trick.

After doing some brief research into the idea of caching my RSS feeds using Django's built-in caching mechanisms, I came up empty. It occurred to me to implement caching in the feed classes themselves. I tried something like this:

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = Article.objects.active().order_by('-publish_date')[:10]
            cache.set('latest_articles', articles)

        return articles

    ...

This code doesn't work! When I would try to retrieve one of my RSS feeds with such "caching" in place, I got the following traceback:

Traceback (most recent call last):

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 280, in run
    self.result = application(self.environ, self.start_response)

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 674, in __call__
    return self.application(environ, start_response)

  File "/home/wheaties/dev/django/core/handlers/wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 143, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 101, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 36, in __call__
    return self.decorator(self.func)(*args, **kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 86, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 215, in feed
    feedgen = f(slug, request).get_feed(param)

  File "/home/wheaties/dev/django/contrib/syndication/feeds.py", line 37, in get_feed
    return super(Feed, self).get_feed(obj, self.request)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 134, in get_feed
    for item in self.__get_dynamic_attr('items', obj):

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 69, in __get_dynamic_attr
    return attr()

  File "/home/wheaties/dev/articles/feeds.py", line 22, in items
    cache.set(key, articles)

  File "/home/wheaties/dev/django/core/cache/backends/filebased.py", line 72, in set
    pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)

PicklingError: Can't pickle <class 'django.utils.functional.__proxy__'>: attribute lookup django.utils.functional.__proxy__ failed

This error took me by surprise. I didn't expect anything like this. I tried a few things to get around it, but then I actually stopped to consider what was happening to cause such an error. My Article objects are definitely serializable, which is why the error didn't make sense.

Then it hit me: the object I was actually attempting to cache was a QuerySet, not a list or tuple of Article objects. Changing the code to wrap the Article.objects.active() call with list().

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = list(Article.objects.active().order_by('-publish_date')[:10])
            cache.set('latest_articles', articles)

        return articles

    ...

And that one worked. I would prefer to cache the actual XML version of the RSS feed, but I will settle with a few hundred fewer hits to my database each day by caching the list of articles. If anyone has better suggestions, I'd love to hear about them. Until then, I hope my experience will help others out there who are in danger of taking down other sites on their shared hosting service!

Auto-Generating Documentation Using Mercurial, ReST, and Sphinx

I often find myself taking notes about various aspects of my job that I feel I would forget as soon as I moved onto another project. I've gotten into the habit of taking my notes using reStructured Text, which shouldn't come as any surprise to any of my regular visitors. On several occasions, I had some of the other guys in the company ask me for some clarification on some things I had taken notes on. Lucky for me, I had taken some nice notes!

However, these individuals probably wouldn't appreciate reading ReST markup as much as I do, so I decided to do something nice for them. I setup Sphinx to prettify my documentation. I then wrote a small Web server using Python, so people within the company network could access the latest version of my notes without much hassle.

Just like I take notes to remind myself of stuff at work, I want to do that again for this automated ReST->HTML magic--I want to be able to do this in the future! I figured I would make my notes even more public this time, so you all can enjoy similar bliss.

Platform Dependence

I am writing this article with UNIX-like operating systems in mind. Please forgive me if you're a Windows user and some of this is not consistent with what you're seeing. Perhaps one day I'll try to set this sort of thing up on Windows.

Installing Sphinx

The first step that we want to take is installing Sphinx. This is the project that Python itself uses to generate its online documentation. It's pretty dang awesome. Feel free to skip this section if you have already installed Sphinx.

Depending on your environment of choice, you may or may not have a package manager that offers python-sphinx or something along those lines. I personally prefer to install it using pip or easy_install:

$ sudo pip install sphinx

Running that command will likely respond with a bunch of output about downloading Sphinx and various dependencies. When I ran it in my sandbox VM, I saw it install the following packages:

  • pygments
  • jinja2
  • docutils
  • sphinx

It should be a pretty speedy installation.

Installing Mercurial

We'll be using Mercurial to keep track of changes to our ReST documentation. Mercurial is a distributed version control system that is built using Python. It's wonderful! Just like with Sphinx, if you have already installed Mercurial, feel free to skip to the next section.

I personally prefer to install Mercurial using pip or easy_install--it's usually more up-to-date than what you would have in your package repositories. To do that, simply run a command such as the following:

$ sudo pip install mercurial

This will go out and download and install the latest stable Mercurial. You may need python-dev or something like that for your platform in order for that command to work. However, if you're on Windows, I highly recommend TortoiseHg. The installer for TortoiseHg will install a graphical Mercurial client along with the command line tools.

Create A Repository

Now let's create a brand new Mercurial repository to house our notes/documentation. Open a terminal/console/command prompt to the location of your choice on your computer and execute the following commands:

$ hg init mydox
$ cd mydox

Configure Sphinx

The next step is to configure Sphinx for our project. Sphinx makes this very simple:

$ sphinx-quickstart

This is a wizard that will walk you through the configuration process for your project. It's pretty safe to accept the defaults, in my opinion. Here's the output of my wizard:

$ sphinx-quickstart
Welcome to the Sphinx quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Enter the root path for documentation.
> Root path for the documentation [.]:

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/N) [n]: y

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:

The project name will occur in several places in the built documentation.
> Project name: My Dox
> Author name(s): Josh VanderLinden

Sphinx has the notion of a "version" and a "release" for the
software. Each version can have multiple releases. For example, for
Python the version is something like 2.5 or 3.0, while the release is
something like 2.5.1 or 3.0a1.  If you don't need this dual structure,
just set both to the same value.
> Project version: 0.0.1
> Project release [0.0.1]:

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:

Please indicate if you want to use one of the following Sphinx extensions:
> autodoc: automatically insert docstrings from modules (y/N) [n]:
> doctest: automatically test code snippets in doctest blocks (y/N) [n]:
> intersphinx: link between Sphinx documentation of different projects (y/N) [n]:
> todo: write "todo" entries that can be shown or hidden on build (y/N) [n]:
> coverage: checks for documentation coverage (y/N) [n]:
> pngmath: include math, rendered as PNG images (y/N) [n]:
> jsmath: include math, rendered in the browser by JSMath (y/N) [n]:
> ifconfig: conditional inclusion of content based on config values (y/N) [n]:

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (Y/n) [y]:
> Create Windows command file? (Y/n) [y]: n

Finished: An initial directory structure has been created.

You should now populate your master file ./source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

If you followed the same steps I did (I separated the source and build directories), you should see three new files in your mydox repository:

  • build/
  • Makefile
  • source/

We'll do our work in the source directory.

Get Some ReST

Now is the time when we start writing some ReST that we want to turn into HTML using Sphinx. Open some file, like first_doc.rst and put some ReST in it. If nothing comes to mind, or you're not familiar with ReST syntax, try the following:

=========================
This Is My First Document
=========================

Yes, this is my first document.  It's lame.  Deal with it.

Save the file (keep in mind that it should be within the source directory if you used the same settings I did). Now it's time to add it to the list of files that Mercurial will pay attention to. While we're at it, let's add the other files that were created by the Sphinx configuration wizard:

$ hg add
adding ../Makefile
adding conf.py
adding first_doc.rst
adding index.rst
$ hg st
A Makefile
A source/conf.py
A source/first_doc.py
A source/index.rst

Don't worry that we don't see all of the directories in the output of hg st--Mercurial tracks files, not directories.

Automate HTML-ization

Here comes the magic in automating the conversion from ReST to HTML: Mercurial hooks. We will use the precommit hook to fire off a command that tells Sphinx to translate our ReST markup into HTML.

Edit your mydox/.hg/hgrc file. If the file does not yet exist, go ahead and create it. Add the following content to it:

[hooks]
precommit.sphinxify = ~/bin/sphinxify_docs.sh

I've opted to call a Bash script instead of using an inline Python call. Now let's create the Bash script, ~/bin/sphinxify_docs.sh:

#!/bin/bash
cd $HOME/mydox
sphinx-build source/ docs/

Notice that I used the $HOME environment variable. This means that I created the mydox directory at /home/myusername/mydox. Adjust that line according to your setup. You'll probably also want to make that script executable:

$ chmod +x ~/bin/sphinxify_docs.sh

Three, Two, One...

You should now be at a stage where you can safely commit changes to your repository and have Sphinx build your HTML documentation. Execute the following command somewhere under your mydox repository:

$ hg ci -m "Initial commit"

If your setup is anything like mine, you should see some output similar to this:

$ hg ci -m "Initial commit"
Making output directory...
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... not found
building [html]: targets for 2 source files that are out of date
updating environment: 2 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... /home/jvanderlinden/mydox/source/first_doc.rst:: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded, 1 warning.
$ hg st
? docs/.buildinfo
? docs/.doctrees/environment.pickle
? docs/.doctrees/first_doc.doctree
? docs/.doctrees/index.doctree
? docs/_sources/first_doc.txt
? docs/_sources/index.txt
? docs/_static/basic.css
? docs/_static/default.css
? docs/_static/doctools.js
? docs/_static/file.png
? docs/_static/jquery.js
? docs/_static/minus.png
? docs/_static/plus.png
? docs/_static/pygments.css
? docs/_static/searchtools.js
? docs/first_doc.html
? docs/genindex.html
? docs/index.html
? docs/objects.inv
? docs/search.html
? docs/searchindex.js

If you see something like that, you're in good shape. Go ahead and take a look at your new mydox/docs/index.html file in the Web browser of your choosing.

Not very exciting, is it? Notice how your first_doc.rst doesn't appear anywhere on that page? That's because we didn't tell Sphinx to put it there. Let's do that now.

Customizing Things

Edit the mydox/source/index.rst file that was created during Sphinx configuration. In the section that starts with .. toctree::, let's tell Sphinx to include everything we ReST-ify:

.. toctree::
   :maxdepth: 2
   :glob:

   *

That should do it. Now, I don't know about you, but I don't really want to include the output HTML, images, CSS, JS, or anything in my documentation repository. It would just take up more space each time we change an .rst file. Let's tell Mercurial to not pay attention to the output HTML--it'll just be static and always up-to-date on our filesystem.

Create a new file called mydox/.hgignore. In this file, put the following content:

syntax: glob
docs/

Save the file, and you should now see something like the following when running hg st:

$ hg st
M source/index.rst
? .hgignore

Let's include the .hgignore file in the list of files that Mercurial will track:

$ hg add .hgignore
$ hg st
M source/index.rst
A .hgignore

Finally, let's commit one more time:

$ hg ci -m "Updating the index to include our .rst files"
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... done
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded.

Tada!! The first_doc.rst should now appear on the index page.

Serving Your Documentation

Who seriously wants to have HTML files that are hard to get to? How can we make it easier to access those HTML files? Perhaps we can create a simple static file Web server? That might sound difficult, but it's really not--not when you have access to Python!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

def main():
    try:
        server = HTTPServer(('', 80), SimpleHTTPRequestHandler)
        server.serve_forever()
    except KeyboardInterrupt:
        server.socket.close()

if __name__ == '__main__':
    main()

I created this simple script and put it in my ~/bin/ directory, also making it executable. Once that's done, you can navigate to your mydox/docs/ directory and run the script. Since I called the script webserver.py, I just do this:

$ cd ~/mydox/docs
$ sudo webserver.py

This makes it possible for you to visit http://localhost/ on your own computer, or to use your computer's IP in place of localhost to access your documentation from a different computer on your network. Pretty slick, if you ask me.

I suppose there's more I could add, but that's all I have time for tonight. Enjoy!

Contextual Grepping

One of the tools I find myself using more and more each day is the amazing grep. It helps me narrow down the list of potential problem children in my code. Sometimes it can even tell me exactly where I need to look if my parameters are specific enough.

For example, the other day, I had a problem where some Python code was attempting to call isdigit() on an integer, when the variable was supposed to be a string. I could have scoured the code manually for all occurrences of the word "isdigit", or I could have used a "search in files" sort of feature in any useful text editor. There are likely other options too. However, I opted to use grep to find what I was looking for.

In the process of fixing this bug, I learned that grep offers the option of displaying a few lines of context around your matching text. There are a few ways you can tell grep to give you some context:

  • -A NUM, --after-context=NUM

    Print NUM lines of trailing context after matching lines. Places a line containing -- between contiguous groups of matches.

  • -B NUM, --before-context=NUM

    Print NUM lines of leading context before matching lines. Places a line containing -- between contiguous groups of matches.

  • -C NUM, --context=NUM

    Print NUM lines of output context. Places a line containing -- between contiguous groups of matches.

I thought this was so useful that I wrote a small shell script to wrap up my common options for grepping--recursive search, display line numbers, and (now) showing some context. Eventually I got around to cleaning up the output by dirtying up the script. Cleaning up the output involved only displaying a matching filename one time, with the line numbers for the context and matching lines below it. I also thought it would be easier to find matching lines if I could colorize the matched text. Here's my script as of noon today.

#!/bin/bash
# Recursively greps for some text in files in the current directory with some
# context lines.

GREEN=`echo -e '\033[41;30;1m'`
NORMAL=`echo -e '\033[0m'`
FIND=$1
grep --exclude=*.svn* --exclude=*.swp -rnC 5 "$FIND" * | \
    awk '{split($1, a, "-"); split(a[1], b, ":"); \
    if (b[1] != file) { file=b[1]; print file; } \
    sub(file, "", $0); print $0; }' | \
    sed -e "s/$FIND/$GREEN&$NORMAL/g;s/^[-\:]//g"

I'm sure there are ways to make this more elegant, but I'm sure happy with it. This little dandy assisted me just this morning in helping a friend resolve some Django bugs!

Here's a screenshot:

cgrep script in action

Announcing: Clip2Zeus

Sometime last year, I embarked on a mission to create my own TinyURL or bit.ly. This project had no real purpose other than to help me learn how to use Google's AppEngine. All of the URL-shortening services I had tried up to that point were perfectly satisfactory for my needs, but I wanted to explore a little.

It didn't take long for me to come up with the site that is now 2ze.us. I learned some neat things about AppEngine, and the site worked well enough for my needs (just like the others). Eventually I wrote a Firefox extension to make it easier to use the site. It offers the ability to quickly shorten "any" URL, and it also has a preview utility. This allows you to hover your cursor over a 2ze.us link and learn various bits of information about it--target domain name, the target page's title, number of hits, etc.

Toward the end of 2009, I started writing the same sort of extension for Chrome/Chromium. It offers pretty much the same sort of functionality as its Firefox brother, minus keyboard shortcuts.

Before long, I found myself embarking on another 2zeus-related endeavor. This new project is one that I am actually quite proud of and satisfied with. I wrote a program that will run in the background on your computer. I call it "Clip2Zeus". This program will periodically poll your clipboard, looking for URLs in whatever text you currently have on it. If any URLs are found, the program will run out to 2ze.us and try to shorten them. Once a valid result comes back from 2ze.us, your clipboard is automatically updated with the original URLs replaced by the shortened version.

It doesn't stop there, though. You can control the program using a couple of interfaces. One interface is a Tk GUI, which allows you to set the polling interval or turn off polling altogether. Should you choose to do that, you can click a button in the GUI any time you explicitly want to shorten URLs in your clipboard. There is another command line interface that offers the same sort of functionality.

I've been using this program on several computers for a couple of weeks, and I haven't noticed any memory/performance problems at all. It works just as well on Windows as it does on Linux, and just as well on OSX as it does on Linux. It just sits there silently until you give it a URL. It works with any program that can access the standard clipboard mechanism for whatever OS you're using.

You can download and install it using easy_install or pip. Or you can download it and install it directly from http://pypi.python.org/pypi/Clip2Zeus/

PyPI Download Stats

Every so often I find myself in need of a small ego boost (or reality check). One of the things I've done in the past to satisfy such a need is go to the PyPI and see how many downloads my packages have. Depending on how much time I have or how much effort I want to put into my pride, I may or may not check the download stats for all releases of each package.

A couple of weeks ago, I was in the mood for an ego boost. It was actually an every day thing for nearly a week! So, instead of wasting a lot of time checking download stats for each version of each package I have on PyPI, I wrote a script to do it for me. It uses the XML-RPC API that PyPI offers.

Here she is!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Calculates the total number of downloads that a particular PyPI package has
received across all versions tracked by PyPI
"""

from datetime import datetime
import locale
import sys
import xmlrpclib

locale.setlocale(locale.LC_ALL, '')

class PyPIDownloadAggregator(object):

    def __init__(self, package_name, include_hidden=True):
        self.package_name = package_name
        self.include_hidden = include_hidden
        self.proxy = xmlrpclib.Server('http://pypi.python.org/pypi')
        self._downloads = {}

        self.first_upload = None
        self.first_upload_rel = None
        self.last_upload = None
        self.last_upload_rel = None

    @property
    def releases(self):
        """Retrieves the release number for each uploaded release"""

        result = self.proxy.package_releases(self.package_name, self.include_hidden)

        if len(result) == 0:
            # no matching package--search for possibles, and limit to 15 results
            results = self.proxy.search({
                'name': self.package_name,
                'description': self.package_name
            }, 'or')[:15]

            # make sure we only get unique package names
            matches = []
            for match in results:
                name = match['name']
                if name not in matches:
                    matches.append(name)

            # if only one package was found, return it
            if len(matches) == 1:
                self.package_name = matches[0]
                return self.releases

            error = """No such package found: %s

Possible matches include:
%s
""" % (self.package_name, '\n'.join('\t- %s' % n for n in matches))

            sys.exit(error)

        return result

    @property
    def downloads(self, force=False):
        """Calculate the total number of downloads for the package"""

        if len(self._downloads) == 0 or force:
            for release in self.releases:
                urls = self.proxy.release_urls(self.package_name, release)
                self._downloads[release] = 0
                for url in urls:
                    # upload times
                    uptime = datetime.strptime(url['upload_time'].value, "%Y%m%dT%H:%M:%S")
                    if self.first_upload is None or uptime < self.first_upload:
                        self.first_upload = uptime
                        self.first_upload_rel = release

                    if self.last_upload is None or uptime > self.last_upload:
                        self.last_upload = uptime
                        self.last_upload_rel = release

                    self._downloads[release] += url['downloads']

        return self._downloads

    def total(self):
        return sum(self.downloads.values())

    def average(self):
        return self.total() / len(self.downloads)

    def max(self):
        return max(self.downloads.values())

    def min(self):
        return min(self.downloads.values())

    def stats(self):
        """Prints a nicely formatted list of statistics about the package"""

        self.downloads # explicitly call, so we have first/last upload data
        fmt = locale.nl_langinfo(locale.D_T_FMT)
        sep = lambda s: locale.format('%d', s, 3)
        val = lambda dt: dt and dt.strftime(fmt) or '--'

        params = (
            self.package_name,
            val(self.first_upload),
            self.first_upload_rel,
            val(self.last_upload),
            self.last_upload_rel,
            sep(len(self.releases)),
            sep(self.max()),
            sep(self.min()),
            sep(self.average()),
            sep(self.total()),
        )

        print """PyPI Package statistics for: %s

    First Upload: %40s (%s)
    Last Upload:  %40s (%s)
    Number of releases: %34s
    Most downloads:    %35s
    Fewest downloads:  %35s
    Average downloads: %35s
    Total downloads:   %35s
""" % params

def main():
    if len(sys.argv) < 2:
        sys.exit('Please specify at least one package name')

    for pkg in sys.argv[1:]:
        PyPIDownloadAggregator(pkg).stats()

if __name__ == '__main__':
    main()

Usage is pretty simple. All you need to do is call the script (I called it pypi_downloads.py with the name or names of the package(s) you want download stats for:

bash-4.0$ ./pypi_downloads.py clip2zeus
PyPI Package statistics for: Clip2Zeus

    First Upload:             Sun 10 Jan 2010 03:25:30 AM  (0.1)
    Last Upload:              Mon 18 Jan 2010 06:58:42 PM  (0.9d)
    Number of releases:                                 12
    Most downloads:                                     41
    Fewest downloads:                                   21
    Average downloads:                                  28
    Total downloads:                                   342

And there you have it!