Syndication Caching in Django

I've recently been working on some performance enhancements on my site. Apparently some of my latest articles are a little too popular for my shared hosting plan. The surge of traffic to my site took down several sites on the same server as my own.

My response to the fiasco was to, among other things, implement caching on my site. It seems like the caching has helped a lot. I've noticed that my RSS feeds are hit almost as hard as real articles on my site, and I noticed that they weren't being cached the way I had expected. I tried a couple of things that I thought would work, but nothing seemed to do the trick.

After doing some brief research into the idea of caching my RSS feeds using Django's built-in caching mechanisms, I came up empty. It occurred to me to implement caching in the feed classes themselves. I tried something like this:

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = Article.objects.active().order_by('-publish_date')[:10]
            cache.set('latest_articles', articles)

        return articles

    ...

This code doesn't work! When I would try to retrieve one of my RSS feeds with such "caching" in place, I got the following traceback:

Traceback (most recent call last):

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 280, in run
    self.result = application(self.environ, self.start_response)

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 674, in __call__
    return self.application(environ, start_response)

  File "/home/wheaties/dev/django/core/handlers/wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 143, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 101, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 36, in __call__
    return self.decorator(self.func)(*args, **kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 86, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 215, in feed
    feedgen = f(slug, request).get_feed(param)

  File "/home/wheaties/dev/django/contrib/syndication/feeds.py", line 37, in get_feed
    return super(Feed, self).get_feed(obj, self.request)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 134, in get_feed
    for item in self.__get_dynamic_attr('items', obj):

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 69, in __get_dynamic_attr
    return attr()

  File "/home/wheaties/dev/articles/feeds.py", line 22, in items
    cache.set(key, articles)

  File "/home/wheaties/dev/django/core/cache/backends/filebased.py", line 72, in set
    pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)

PicklingError: Can't pickle <class 'django.utils.functional.__proxy__'>: attribute lookup django.utils.functional.__proxy__ failed

This error took me by surprise. I didn't expect anything like this. I tried a few things to get around it, but then I actually stopped to consider what was happening to cause such an error. My Article objects are definitely serializable, which is why the error didn't make sense.

Then it hit me: the object I was actually attempting to cache was a QuerySet, not a list or tuple of Article objects. Changing the code to wrap the Article.objects.active() call with list().

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = list(Article.objects.active().order_by('-publish_date')[:10])
            cache.set('latest_articles', articles)

        return articles

    ...

And that one worked. I would prefer to cache the actual XML version of the RSS feed, but I will settle with a few hundred fewer hits to my database each day by caching the list of articles. If anyone has better suggestions, I'd love to hear about them. Until then, I hope my experience will help others out there who are in danger of taking down other sites on their shared hosting service!

Contextual Grepping

One of the tools I find myself using more and more each day is the amazing grep. It helps me narrow down the list of potential problem children in my code. Sometimes it can even tell me exactly where I need to look if my parameters are specific enough.

For example, the other day, I had a problem where some Python code was attempting to call isdigit() on an integer, when the variable was supposed to be a string. I could have scoured the code manually for all occurrences of the word "isdigit", or I could have used a "search in files" sort of feature in any useful text editor. There are likely other options too. However, I opted to use grep to find what I was looking for.

In the process of fixing this bug, I learned that grep offers the option of displaying a few lines of context around your matching text. There are a few ways you can tell grep to give you some context:

  • -A NUM, --after-context=NUM

    Print NUM lines of trailing context after matching lines. Places a line containing -- between contiguous groups of matches.

  • -B NUM, --before-context=NUM

    Print NUM lines of leading context before matching lines. Places a line containing -- between contiguous groups of matches.

  • -C NUM, --context=NUM

    Print NUM lines of output context. Places a line containing -- between contiguous groups of matches.

I thought this was so useful that I wrote a small shell script to wrap up my common options for grepping--recursive search, display line numbers, and (now) showing some context. Eventually I got around to cleaning up the output by dirtying up the script. Cleaning up the output involved only displaying a matching filename one time, with the line numbers for the context and matching lines below it. I also thought it would be easier to find matching lines if I could colorize the matched text. Here's my script as of noon today.

#!/bin/bash
# Recursively greps for some text in files in the current directory with some
# context lines.

GREEN=`echo -e '\033[41;30;1m'`
NORMAL=`echo -e '\033[0m'`
FIND=$1
grep --exclude=*.svn* --exclude=*.swp -rnC 5 "$FIND" * | \
    awk '{split($1, a, "-"); split(a[1], b, ":"); \
    if (b[1] != file) { file=b[1]; print file; } \
    sub(file, "", $0); print $0; }' | \
    sed -e "s/$FIND/$GREEN&$NORMAL/g;s/^[-\:]//g"

I'm sure there are ways to make this more elegant, but I'm sure happy with it. This little dandy assisted me just this morning in helping a friend resolve some Django bugs!

Here's a screenshot:

cgrep script in action

Announcing: Clip2Zeus

Sometime last year, I embarked on a mission to create my own TinyURL or bit.ly. This project had no real purpose other than to help me learn how to use Google's AppEngine. All of the URL-shortening services I had tried up to that point were perfectly satisfactory for my needs, but I wanted to explore a little.

It didn't take long for me to come up with the site that is now 2ze.us. I learned some neat things about AppEngine, and the site worked well enough for my needs (just like the others). Eventually I wrote a Firefox extension to make it easier to use the site. It offers the ability to quickly shorten "any" URL, and it also has a preview utility. This allows you to hover your cursor over a 2ze.us link and learn various bits of information about it--target domain name, the target page's title, number of hits, etc.

Toward the end of 2009, I started writing the same sort of extension for Chrome/Chromium. It offers pretty much the same sort of functionality as its Firefox brother, minus keyboard shortcuts.

Before long, I found myself embarking on another 2zeus-related endeavor. This new project is one that I am actually quite proud of and satisfied with. I wrote a program that will run in the background on your computer. I call it "Clip2Zeus". This program will periodically poll your clipboard, looking for URLs in whatever text you currently have on it. If any URLs are found, the program will run out to 2ze.us and try to shorten them. Once a valid result comes back from 2ze.us, your clipboard is automatically updated with the original URLs replaced by the shortened version.

It doesn't stop there, though. You can control the program using a couple of interfaces. One interface is a Tk GUI, which allows you to set the polling interval or turn off polling altogether. Should you choose to do that, you can click a button in the GUI any time you explicitly want to shorten URLs in your clipboard. There is another command line interface that offers the same sort of functionality.

I've been using this program on several computers for a couple of weeks, and I haven't noticed any memory/performance problems at all. It works just as well on Windows as it does on Linux, and just as well on OSX as it does on Linux. It just sits there silently until you give it a URL. It works with any program that can access the standard clipboard mechanism for whatever OS you're using.

You can download and install it using easy_install or pip. Or you can download it and install it directly from http://pypi.python.org/pypi/Clip2Zeus/

PyPI Download Stats

Every so often I find myself in need of a small ego boost (or reality check). One of the things I've done in the past to satisfy such a need is go to the PyPI and see how many downloads my packages have. Depending on how much time I have or how much effort I want to put into my pride, I may or may not check the download stats for all releases of each package.

A couple of weeks ago, I was in the mood for an ego boost. It was actually an every day thing for nearly a week! So, instead of wasting a lot of time checking download stats for each version of each package I have on PyPI, I wrote a script to do it for me. It uses the XML-RPC API that PyPI offers.

Here she is!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
Calculates the total number of downloads that a particular PyPI package has
received across all versions tracked by PyPI
"""

from datetime import datetime
import locale
import sys
import xmlrpclib

locale.setlocale(locale.LC_ALL, '')

class PyPIDownloadAggregator(object):

    def __init__(self, package_name, include_hidden=True):
        self.package_name = package_name
        self.include_hidden = include_hidden
        self.proxy = xmlrpclib.Server('http://pypi.python.org/pypi')
        self._downloads = {}

        self.first_upload = None
        self.first_upload_rel = None
        self.last_upload = None
        self.last_upload_rel = None

    @property
    def releases(self):
        """Retrieves the release number for each uploaded release"""

        result = self.proxy.package_releases(self.package_name, self.include_hidden)

        if len(result) == 0:
            # no matching package--search for possibles, and limit to 15 results
            results = self.proxy.search({
                'name': self.package_name,
                'description': self.package_name
            }, 'or')[:15]

            # make sure we only get unique package names
            matches = []
            for match in results:
                name = match['name']
                if name not in matches:
                    matches.append(name)

            # if only one package was found, return it
            if len(matches) == 1:
                self.package_name = matches[0]
                return self.releases

            error = """No such package found: %s

Possible matches include:
%s
""" % (self.package_name, '\n'.join('\t- %s' % n for n in matches))

            sys.exit(error)

        return result

    @property
    def downloads(self, force=False):
        """Calculate the total number of downloads for the package"""

        if len(self._downloads) == 0 or force:
            for release in self.releases:
                urls = self.proxy.release_urls(self.package_name, release)
                self._downloads[release] = 0
                for url in urls:
                    # upload times
                    uptime = datetime.strptime(url['upload_time'].value, "%Y%m%dT%H:%M:%S")
                    if self.first_upload is None or uptime < self.first_upload:
                        self.first_upload = uptime
                        self.first_upload_rel = release

                    if self.last_upload is None or uptime > self.last_upload:
                        self.last_upload = uptime
                        self.last_upload_rel = release

                    self._downloads[release] += url['downloads']

        return self._downloads

    def total(self):
        return sum(self.downloads.values())

    def average(self):
        return self.total() / len(self.downloads)

    def max(self):
        return max(self.downloads.values())

    def min(self):
        return min(self.downloads.values())

    def stats(self):
        """Prints a nicely formatted list of statistics about the package"""

        self.downloads # explicitly call, so we have first/last upload data
        fmt = locale.nl_langinfo(locale.D_T_FMT)
        sep = lambda s: locale.format('%d', s, 3)
        val = lambda dt: dt and dt.strftime(fmt) or '--'

        params = (
            self.package_name,
            val(self.first_upload),
            self.first_upload_rel,
            val(self.last_upload),
            self.last_upload_rel,
            sep(len(self.releases)),
            sep(self.max()),
            sep(self.min()),
            sep(self.average()),
            sep(self.total()),
        )

        print """PyPI Package statistics for: %s

    First Upload: %40s (%s)
    Last Upload:  %40s (%s)
    Number of releases: %34s
    Most downloads:    %35s
    Fewest downloads:  %35s
    Average downloads: %35s
    Total downloads:   %35s
""" % params

def main():
    if len(sys.argv) < 2:
        sys.exit('Please specify at least one package name')

    for pkg in sys.argv[1:]:
        PyPIDownloadAggregator(pkg).stats()

if __name__ == '__main__':
    main()

Usage is pretty simple. All you need to do is call the script (I called it pypi_downloads.py with the name or names of the package(s) you want download stats for:

bash-4.0$ ./pypi_downloads.py clip2zeus
PyPI Package statistics for: Clip2Zeus

    First Upload:             Sun 10 Jan 2010 03:25:30 AM  (0.1)
    Last Upload:              Mon 18 Jan 2010 06:58:42 PM  (0.9d)
    Number of releases:                                 12
    Most downloads:                                     41
    Fewest downloads:                                   21
    Average downloads:                                  28
    Total downloads:                                   342

And there you have it!

Review: Django 1.0 Web Site Development

Introduction

Several months ago, a UK-based book publisher, Packt Publishing contacted me to ask if I would be willing to review one of their books about Django. I gladly jumped at the opportunity, and I received a copy of the book a couple of weeks later in the mail. This happened at the beginning of September 2009. It just so happened that I was in the process of being hired on by ScienceLogic right when all of this took place. The subsequent weeks were filled to the brim with visitors, packing, moving, finding an apartment, and commuting to my new job. It was pretty stressful.

Things are finally settling down, so I've taken the time to actually review the book I was asked to review. I should mention right off the bat that this is indeed a solicited review, but I am in no way influenced to write a good or bad review. Packt Publishing simply wants me to offer an honest review of the book, and that is what I indend to do. While reviewing the book, I decided to follow along and write the code the book introduced. I made sure that I was using the official Django 1.0 release instead of using trunk like I tend to do for my own projects.

The title of the book is Django 1.0 Web Site Development, written by Ayman Hourieh, and it's only 250 pages long. Ayman described the audience of the book as such:

This book is for web developers who want to learn how to build a complete site with Web 2.0 features, using the power of a proven and popular development system--Django--but do not necessarily want to learn how a complete framework functions in order to do this. Basic knowledge of Python development is required for this book, but no knowledge of Django is expected.

Ayman introduced Django piece by piece using the end goal of a social bookmarking site, a la del.icio.us and reddit. In the first chapter of the book, Ayman discussed the history of Django and why Python and Django are a good platform upon which to build Web applications. The second chapter offers a brief guide to installing Python and Django, and getting your first project setup. Not much to comment on here.

Digging In

Chapter three is where the reader was introduced to the basic structure of a Django project, and the initial data models were described. Chapter four discussed user registration and management. We made it possible for users to create accounts, log into them, and log out again. As part of those additions, the django.forms framework was introduced.

In chapter five, we made it possible for bookmarks to be tagged. Along with that, we built a tag cloud, restricted access to certain pages, and added a little protection against malicious data input. Next up was the section where things actually started getting interesting for me: enhancing the interface with fancy effects and AJAX. The fancy effects include live searching for bookmarks, being able to edit a bookmark in place (without loading a new page), and auto-completing tags when you submit a bookmark.

This chapter really reminded me just how simple it is to add new, useful features to existing code using Django and Python. I was thoroughly impressed at how easy it was to add the AJAX functionality mentioned above. Auto-completing the tags as you type, while jQuery and friends did most of the work, was very easy to implement. It made me happy.

Chapter seven introduced some code that allowed users to share their bookmarks with others. Along with this, the ability to vote on shared bookmarks was added. Another feature that was added in this chapter was the ability for users to comment on various bookmarks.

The ridiculously amazing Django Administration utility was first introduced in chapter eight. It kinda surprised me that it took 150 pages before this feature was brought to the user's attention. In my opinion, this is one of the most useful selling points when one is considering a Web framework for a project. When I first encountered Django, the admin interface was one of maybe three deciding factors in our company's decision to become a full-on Django shop.

Bring on the Web 2.0

Anyway, in chapter nine, we added a handful of useful "Web 2.0" features. RSS feeds were introduced. We learned about pagination to enhance usability and performance. We also improved the search engine in our project. At this stage, the magical Q objects were mentioned. The power behind the Q objects was discussed very well, in my opinion.

In chapter 10, we were taught how we can create relationships between members on the site. We made it possible for users to become "friends" so they can see the latest bookmarks posted by their friends. We also added an option for users to be able to invite some of their other friends to join the site via email, complete with activation links. Finally, we improved the user interface by providing a little bit of feedback to the user at various points using the messages framework that is part of the django.contrib.auth package in Django 1.0.

More advanced topics, such as internationalization and caching, were discussed in chapter 11. Django's special unit testing features were also introduced in chapter 11. This section actually kinda frustrated me. Caching was discussed immediately before unit testing. In the caching section, we learned how to enable site-wide caching. This actually broke the unit tests. They failed because the caching system was "read only" while running the tests. Anyway, it's probably more or less a moot point.

Chapter 11 also briefly introduced things to pay attention to when you deploy your Django projects into a production environment. This portion was mildly disappointing, but I don't know what else would have made it better. There are so many functional ways to deploy Django projects that you could write books just to describe the minutia involved in deployment.

The twelfth and final chapter discussed some of the other things that Django has to offer, such as enhanced functionality in templates using custom template tags and filters and model managers. Generic views were mentioned, and some of the other useful things in django.contrib were brought up. Ayman also offered a few ideas of additional functionality that the reader can implement on their own, using the things they learned throughout the book.

Afterthoughts

Overall, I felt that this book did a great job of introducing the power that lies in using Django as your framework of choice. I thought Ayman managed to break things up into logical sections, and that the iterations used to enhance existing functionality (from earlier chapters) were superbly executed. I think that this book, while it does assume some prior Python knowledge, would be a fine choice for those who are curious to dig into Django quickly and easily.

Some of the beefs I have with this book deal mostly with the editing. There were a lot of strange things that I found while reading through the book. However, the biggest sticking point for me has to do with "pluggable" applications. Earlier I mentioned that the built-in Django admin was one of only a few deciding factors in my company's choice to become a Django shop. Django was designed to allow its applications to be very "pluggable."

You may be asking, "What do I mean by 'pluggable'?" Well, say you decide to build a website that includes a blog, so you build a Django project and create an application specific to blogging. Then, at some later time, you need to build another site that also has blog functionality. Do you want to rewrite all of the blogging code for the second site? Or do you want to use the same code that you used in the first site (without copying it)? If you're anything like me and thousands of other developers out there, you would probably rather leverage the work you had already done. Django allows you to do this if you build your Django applications properly.

This book, however, makes no such effort to teach the reader how to turn all of their hard work on the social bookmarking features into something they could reuse over and over with minimal effort in the future. Application-specific templates are placed directly into the global templates directory. Application-specific URLconfs are placed in the root urls.py file. I would have liked to see at least some effort to make the bookmarking application have the potential to be reused.

Finally, the most obvious gripe is that the book is outdated. That's understandable, though! Anything in print media will likely be outdated the second it is printed if the book has anything to do with computers. However, with the understanding that this book was written specifically for Django 1.0 and not Django 1.1 or 1.2 alpha, it does an excellent job at hitting the mark.

Another Bash Tip

I just learned yet another goodie about the Bash shell that I must share with you. This trick made my day on so many levels.

You know how annoying it is when you get those ridiculously long commands in a terminal window? You know how much more annoying it is when you generally can't Ctrl+arrow around the command to change bits and pieces when you're on OSX? If you've ever been in that boat, this tip is for you.

Bash allows you to hit Ctrl+x Ctrl+e to edit your current command in your "preferred" editor. Your "preferred" editor is determined from the EDITOR environment variable. Since I'm a fan of VIM, all I need to do is make sure I've got export EDITOR=vim in my .bashrc or something along those lines. Once I do that, I can hit Ctrl+x Ctrl+e anytime I am using Bash and have a smelly, long command I want to manipulate.

See it in action.

Tip: easy_install / pip

With all of the exciting updates to Mercurial recently, I've been on a rampage, updating various boxes everywhere I go. I'm in the habit of using easy_install and/or pip to install most of my Python-related packages. It's pretty easy to install packages that are in well-known locations (like PyPI or on Google Code, for example). It's also pretty easy to update packages using either utility. Both take a -U parameter, which, to my knowledge, tells it to actually check for updates and install the latest version.

That's all fine and dandy, but what happens when you want to install an "unofficial" version of some package? I mean, what if your favorite project all of the sudden includes some feature that you will die unless you can have access to it and the next official version is weeks or months in the future? There are typically a few avenues you can take to satisfy your needs, but I wanted to bring up something that I think not many people are aware of: easy_install and pip can both understand URLs to installable Python packages.

What do I mean by that, you ask? Well, when you get down to the basics of what both utilities do, they just take care of downloading some Python package and installing it with the setup.py file contained therein. In many cases, these utilities will search various package repositories, such as PyPI, to download whatever package you specify. If the package is found, it will be downloaded and extracted.

In most cases, you can do all of that yourself:

$ wget http://pypi.python.org/someproject/somepackage.tar.gz
$ tar zxf somepackage.tar.gz
$ cd somepackage
$ python setup.py install

Both easy_install and pip obviously do a lot of other magic, but that is perhaps the most basic way to understand what they do. To answer that last question, you can help your utility of choice out by specifying the exact URL to the specific package you want it to install for you:

$ easy_install http://pypi.python.org/someproject/somepackage.tar.gz
$ pip install http://pypi.python.org/someproject/somepackage.tar.gz

For me, this feature comes in very handy with projects that are hosted on BitBucket, for example, because you can always get any revision of the project in a tidy .tar.gz file. So when I'm updating Mercurial installations, I can do this to get the latest stable revision:

$ easy_install http://selenic.com/repo/hg-stable/archive/tip.tar.gz

It's pretty slick. Here's a full example:

[user@web ~]$ hg version
Mercurial Distributed SCM (version 1.2.1)

Copyright (C) 2005-2009 Matt Mackall <mpm@selenic.com> and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[user@web ~]$ easy_install http://selenic.com/repo/hg-stable/archive/tip.tar.gz
Downloading http://selenic.com/repo/hg-stable/archive/tip.tar.gz
Processing tip.tar.gz
Running Mercurial-stable-branch--8bce1e0d2801/setup.py -q bdist_egg --dist-dir /tmp/easy_install-Gnk2c9/Mercurial-stable-branch--8bce1e0d2801/egg-dist-tmp--2VAce
zip_safe flag not set; analyzing archive contents...
mercurial.help: module references __file__
mercurial.templater: module references __file__
mercurial.extensions: module references __file__
mercurial.i18n: module references __file__
mercurial.lsprof: module references __file__
Removing mercurial unknown from easy-install.pth file
Adding mercurial 1.4.1-4-8bce1e0d2801 to easy-install.pth file
Installing hg script to /home/user/bin

Installed /home/user/lib/python2.5/mercurial-1.4.1_4_8bce1e0d2801-py2.5-linux-i686.egg
Processing dependencies for mercurial==1.4.1-4-8bce1e0d2801
Finished processing dependencies for mercurial==1.4.1-4-8bce1e0d2801
[user@web ~]$ hg version
Mercurial Distributed SCM (version 1.4.1+4-8bce1e0d2801)

Copyright (C) 2005-2009 Matt Mackall <mpm@selenic.com> and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Notice the version change from 1.2.1 to 1.4.1+4-8bce1e0d2801. w00t.

Edit: devov pointed out that pip is capable of installing packages directly from its repository. I've never used this functionality, but I'm interested in trying it out sometime! Thanks devov!

Mercurial 1.4.1 Released

I just noticed that Mercurial 1.4.1 was released today. Most of the changes are pretty minor, but I wanted to voice my appreciation for a new extension that is included with this release: schemes.

This extension basically makes your life easier by shortening redundant URLs for you. For example, you can now use the following command to snag my simple Mercurial extensions repo from BitBucket:

hg clone bb://codekoala/hgext

Without hgext.schemes, that command would be something like one of the following commands:

hg clone http://bitbucket.org/codekoala/hgext
hg clone ssh://hg@bitbucket.org/codekoala/hgext

Not the most ground-breaking of extensions, but still pretty slick!

Bash Time Saver

The other day I was helping a friend get their Django site back online, and I found myself doing a couple of very similar commands when backing up the databases:

$ pg_dump -U some_database -f some_database_backup.sql

I use Bash as my shell in Linux, and apparently it's capable of doing some neat string substitution. I wanted to try something I had seen a few days prior to save myself a few keystrokes. Here's what I was trying to do:

$ pg_dump -U some_database -f some_database_backup.sql
$ ^some_database^some_other_database

Which, I had hoped would replace all instances of some_database with some_other_database from the previously executed command. It turns out that the ^search-for^replace-with is only good for one substitution. That means that my sneaky attempt at saving keystrokes just overwrote the first database's backup with the backup of the second database. In other words, I was getting this:

$ pg_dump -U some_database -f some_database_backup.sql
$ pg_dump -U some_other_database -f some_database_backup.sql

...when I wanted this...

$ pg_dump -U some_database -f some_database_backup.sql
$ pg_dump -U some_other_database -f some_other_database_backup.sql

I did a little more digging today, and it turns out that I should have been using the following syntax:

$ pg_dump -U some_database -f some_database_backup.sql
$ !!:gs/some_database/some_other_database/

The !! is shorthand for executing the previous command, and the :gs is used for global string replacement. And there you have it!

Out of curiosity, does anyone know of similar functionality in other shells?

OSX, Growl, And Subversion

Today I found myself trying to figure out how to make a terminal window stay permanent on my desktop or dashboard on OSX, similar to what I've done in the past with Linux. I just wanted to have the terminal window monitoring things in the background for me. Actually, all I wanted to do was keep track of when my local working copy of our Subversion repository was out of sync. I wanted a solution that would keep out of my way, but I also wanted it to be easy.

My search for a solution seemed short-lived when a Google search suggested a dashboard widget for the Terminal application. The problem with it was that the download server was dead or simply blocked by my company's Internet filter. One way or another, it wasn't long before I went in search of another solution.

At that very instant, I received a Growl notification from some program. That's when it dawned on me--I could tell Growl to tell me when my working copy was out of sync. I had done stuff like that in the past, so I set out to write my solution. This is what I came up with:

#!/bin/bash
MY_BOX=[my IP address]
DEV_ROOT='/path/to/svn/working copy'
cd $DEV_ROOT

MY_REV=`svn log --limit 1 | awk '/^r/ {print $1}' | sed 's/[^0-9]//g'`
SVN_REV=`svn log --limit 1 -r HEAD | awk '/^r/ {print $1}' | sed 's/[^0-9]//g'`

if [[ $MY_REV != $SVN_REV ]]; then
    ssh username@$MY_BOX "growlnotify -s -d47111 -n 'iTerm' -t 'Out Of Sync' -m 'Your working copy is out of sync.  Repository is at revision $SVN_REV, and your working copy is at $MY_REV.'"
fi

Now, a little bit about my environment. As I've mentioned before, all of our development really takes place on Linux-powered virtual machines. We simply use our Macs as the system to interact with those virtual machines. That is why there's the ssh line in that script.

Basically, this script just checks the most recent revision in your local working copy. Then it checks the latest revision in the repository itself. It compares the two revision numbers, and if it finds a difference, it will SSH into my OSX box to send me a Growl notification. On the OSX side, I have Growl and growlnotify installed. Here's a summary of the options to growlnotify:

  • -s: make the notification sticky--don't hide the notification until the user specifically closes it.
  • -d47111: a unique identifier for the notification. This makes it so you can send the same message over and over and it would update any existing notifications with that ID instead of creating a new notification (unless one doesn't exist already).
  • -n 'iTerm': I believe this was supposed to be the "source" application. I don't remember right now.
  • -t 'Out Of Sync': The title for the notification.
  • -m 'Your working copy...': The message to send to my Mac.

This is a fabulous little reminder to me. I have it set up as a cronjob that runs every minute on my Linux-powered development virtual machine. Hopefully this will help others!