Simple Weighted Sort in Python

Last night I found myself in need of a simple weighted sort function in Python. I had a list of integers which represented object IDs in my project. Some of the objects needed to be processed before the others while iterating over the list of integers, and I already knew which object IDs those were. The order the rest of the object IDs were processed didn't matter at all. I just wanted the special object IDs to arrive at the beginning of the list, and the remaining object IDs could be in any order.

I was surprised at how simple it was to produce such a weighted sort. Here's an example of what I did:

import random
object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]
print 'Object IDs:', object_ids
print 'Special IDs:', special_ids

object_ids.sort(key=special_ids.__contains__, reverse=True)
print 'Object IDs:', object_ids

And some sample output:

Object IDs: [13, 97, 67, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 27, 11, 6, 1]
Special IDs: [13, 1, 27, 6, 67]
Object IDs: [13, 67, 27, 6, 1, 97, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 11]

Notice that each of the "special" IDs have shifted from their original position in the object_ids list to be at the beginning of the list after the sort.

The Python documentation for sort says that the key argument "specifies a function of one argument that is used to extract a comparison key from each list element." I'm using it to check to see if a given element in the list is in my special_ids list. If the element is present in the special_ids list, it will be shifted to the left because of the way the special_ids.__contains__ works.

In sorting, a value of 1 (or other positive integer) out of a comparison function generally means "this belongs to the right of the other element." A value of -1 (or other negative integer) means "this belongs to the left of the other element." A value of 0 means "these two elements are equal" (for the purposes of sorting). I'm assuming it works similarly with the key argument. Please correct me if I'm wrong!

As lqc states in the comments below, the key argument works differently. It creates a new sequence of values which is then sorted. Before lqc jumped in, I was using key=int(i in special_ids) * -2 + 1 to do the sorting, which is pretty dumb. Using key=special_ids.__contains__ is much more appropriate. Thanks lqc!!

This sort of weighted sort might not be just right for your needs, but hopefully it will give you a place to start to build your customized weighted sort!

Selenium Unit Test Reuse

Yesterday, one of the QA guys at work approached me with a question that turned out to be much more interesting to me than I think he had planned. He's been doing some unit testing using Selenium, exporting his test cases to Python. His question was this: how can I run the same unit tests using multiple browsers and multiple target servers?

I'm pretty sure he expected a simple 3-step answer or something like that. Instead, he got my crazy wide-eyed "ohhh... that's something I want to experiment with!" look. I started rambling on about inheritance, dynamic class creation, and nested for loops. His eyes started to look a little worried. He didn't really appreciate the nerdy lingo that much. I told him to pull up a chair and get comfortable.

Since I already had some other work I needed to pay attention to, I didn't want to spend too much time trying to figure out a good way to solve his problem. After about 20 minutes of devilish chuckles and frantic rustling through Python documentation, I came up with the following code:

from types import ClassType
from selenium import selenium
import unittest

IPS = ['192.168.0.1', '192.168.0.2']
BROWSERS = ['safari', 'chrome']

class SomeUnitTest(object):

    def test_something(self):
        sel = self.selenium
        # test code

def main(base):
    suites = []
    results = unittest.TestResult()

    for iidx, ip in enumerate(IPS):
        for bidx, browser in enumerate(BROWSERS):
            def setUp(self):
                self.verificationErrors = []
                self.selenium = selenium("localhost", 4444, "*%s" % self.browser, "http://%s/" % self.ip)
                self.selenium.start()

            def tearDown(self):
                self.selenium.stop()
                self.assertEqual([], self.verificationErrors)

            ut = ClassType('UT_%i_%i' % (iidx, bidx), (unittest.TestCase, base), {'ip': ip, 'browser': browser})
            ut.setUp = setUp
            ut.tearDown = tearDown

            suites.append(unittest.TestLoader().loadTestsFromTestCase(ut))

    unittest.TestSuite(suites)(results)
    for obj, error in results.errors:
        print 'In: ', obj
        print error

if __name__ == "__main__":
    main(SomeUnitTest)

I know, I know... it's got some dirty rotten tricks in it, and there are probably more efficient ways of doing what I've done. If the code offends you, look up at my previous disclaimer: I had other things I needed to be working on, so I didn't spend much time refining this. One thing I'm almost certain could be done better is not monkey patching the dynamic classes with the setUp and tearDown methods. Also, the output at the end of the test execution could definitely use some love. Oh well. Perhaps another day I'll get around to that.

Basically, you just set the servers you need to test and the browsers you want Selenium to run the tests in. Those are at the top of the script: IPS and BROWSERS. Then a new unittest.TestCase class is created for each combination of IP/server+browser. Finally, each of the test cases is thrown into a TestSuite, and the suite is processed. If there were any errors during the tests, they'll be printed out. We weren't really concerned with printing out other information, but you can certainly make other meaningful feedback appear.

Anyway, I thought that someone out there might very well benefit from my little experiment on my co-worker's question. Feel free to comment on your personal adventures with some variation of the code if you find it useful!

Python And Execution Context

I recently found myself in a situation where knowing the execution context of a function became necessary. It took me several hours to learn about this functionality, despite many cleverly-crafted Google searches. So, being the generous person I am, I want to share my findings.

My particular use case required that a function behave differently depending on whether it was called in an exec call. Specifics beyond that are not important for this article. Here's an example of how I was able to get my desired behavior.

import inspect

def is_exec():
    caller = inspect.currentframe().f_back
    module = inspect.getmodule(caller)

    if module is None:
        print "I'm being run by exec!"
    else:
        print "I'm being run by %s" % module.__name__

def main():
    is_exec()

    exec "is_exec()"

if __name__ == '__main__':
    main()

The output of such a script would look like this:

$ python is_exec.py
I'm being run by __main__
I'm being run by exec!

It's also interesting to note that when you're using the Python interactive interpreter, calling the is_exec function from the code above will tell you that you are indeed using exec.

Some may argue that modifying behavior as I needed to is dirty, and that if your system requires such code, you're doing it wrong. Well, you could apply this sort of code to situations that have nothing to do with exec. Perhaps you want to determine which part of your product is using a specific function the most. Perhaps you want to get additional debugging information that isn't immediately obvious.

Just like always, I want to add the disclaimer that there may be other ways to do this and there probably are. However, this is the way that worked for me. I'd still be interested to here about other solutions you may have encountered for this problem.

On a side note, if you're up for some slightly advanced Python mumbo jumbo, I suggest diving into the inspect documentation.

Site-Wide Caching in Django

My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

  • django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
  • django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

  • CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
  • CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.

Auto-Generating Documentation Using Mercurial, ReST, and Sphinx

I often find myself taking notes about various aspects of my job that I feel I would forget as soon as I moved onto another project. I've gotten into the habit of taking my notes using reStructured Text, which shouldn't come as any surprise to any of my regular visitors. On several occasions, I had some of the other guys in the company ask me for some clarification on some things I had taken notes on. Lucky for me, I had taken some nice notes!

However, these individuals probably wouldn't appreciate reading ReST markup as much as I do, so I decided to do something nice for them. I setup Sphinx to prettify my documentation. I then wrote a small Web server using Python, so people within the company network could access the latest version of my notes without much hassle.

Just like I take notes to remind myself of stuff at work, I want to do that again for this automated ReST->HTML magic--I want to be able to do this in the future! I figured I would make my notes even more public this time, so you all can enjoy similar bliss.

Platform Dependence

I am writing this article with UNIX-like operating systems in mind. Please forgive me if you're a Windows user and some of this is not consistent with what you're seeing. Perhaps one day I'll try to set this sort of thing up on Windows.

Installing Sphinx

The first step that we want to take is installing Sphinx. This is the project that Python itself uses to generate its online documentation. It's pretty dang awesome. Feel free to skip this section if you have already installed Sphinx.

Depending on your environment of choice, you may or may not have a package manager that offers python-sphinx or something along those lines. I personally prefer to install it using pip or easy_install:

$ sudo pip install sphinx

Running that command will likely respond with a bunch of output about downloading Sphinx and various dependencies. When I ran it in my sandbox VM, I saw it install the following packages:

  • pygments
  • jinja2
  • docutils
  • sphinx

It should be a pretty speedy installation.

Installing Mercurial

We'll be using Mercurial to keep track of changes to our ReST documentation. Mercurial is a distributed version control system that is built using Python. It's wonderful! Just like with Sphinx, if you have already installed Mercurial, feel free to skip to the next section.

I personally prefer to install Mercurial using pip or easy_install--it's usually more up-to-date than what you would have in your package repositories. To do that, simply run a command such as the following:

$ sudo pip install mercurial

This will go out and download and install the latest stable Mercurial. You may need python-dev or something like that for your platform in order for that command to work. However, if you're on Windows, I highly recommend TortoiseHg. The installer for TortoiseHg will install a graphical Mercurial client along with the command line tools.

Create A Repository

Now let's create a brand new Mercurial repository to house our notes/documentation. Open a terminal/console/command prompt to the location of your choice on your computer and execute the following commands:

$ hg init mydox
$ cd mydox

Configure Sphinx

The next step is to configure Sphinx for our project. Sphinx makes this very simple:

$ sphinx-quickstart

This is a wizard that will walk you through the configuration process for your project. It's pretty safe to accept the defaults, in my opinion. Here's the output of my wizard:

$ sphinx-quickstart
Welcome to the Sphinx quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value, if one is given in brackets).

Enter the root path for documentation.
> Root path for the documentation [.]:

You have two options for placing the build directory for Sphinx output.
Either, you use a directory "_build" within the root path, or you separate
"source" and "build" directories within the root path.
> Separate source and build directories (y/N) [n]: y

Inside the root directory, two more directories will be created; "_templates"
for custom HTML templates and "_static" for custom stylesheets and other static
files. You can enter another prefix (such as ".") to replace the underscore.
> Name prefix for templates and static dir [_]:

The project name will occur in several places in the built documentation.
> Project name: My Dox
> Author name(s): Josh VanderLinden

Sphinx has the notion of a "version" and a "release" for the
software. Each version can have multiple releases. For example, for
Python the version is something like 2.5 or 3.0, while the release is
something like 2.5.1 or 3.0a1.  If you don't need this dual structure,
just set both to the same value.
> Project version: 0.0.1
> Project release [0.0.1]:

The file name suffix for source files. Commonly, this is either ".txt"
or ".rst".  Only files with this suffix are considered documents.
> Source file suffix [.rst]:

One document is special in that it is considered the top node of the
"contents tree", that is, it is the root of the hierarchical structure
of the documents. Normally, this is "index", but if your "index"
document is a custom template, you can also set this to another filename.
> Name of your master document (without suffix) [index]:

Please indicate if you want to use one of the following Sphinx extensions:
> autodoc: automatically insert docstrings from modules (y/N) [n]:
> doctest: automatically test code snippets in doctest blocks (y/N) [n]:
> intersphinx: link between Sphinx documentation of different projects (y/N) [n]:
> todo: write "todo" entries that can be shown or hidden on build (y/N) [n]:
> coverage: checks for documentation coverage (y/N) [n]:
> pngmath: include math, rendered as PNG images (y/N) [n]:
> jsmath: include math, rendered in the browser by JSMath (y/N) [n]:
> ifconfig: conditional inclusion of content based on config values (y/N) [n]:

A Makefile and a Windows command file can be generated for you so that you
only have to run e.g. `make html' instead of invoking sphinx-build
directly.
> Create Makefile? (Y/n) [y]:
> Create Windows command file? (Y/n) [y]: n

Finished: An initial directory structure has been created.

You should now populate your master file ./source/index.rst and create other documentation
source files. Use the Makefile to build the docs, like so:
   make builder
where "builder" is one of the supported builders, e.g. html, latex or linkcheck.

If you followed the same steps I did (I separated the source and build directories), you should see three new files in your mydox repository:

  • build/
  • Makefile
  • source/

We'll do our work in the source directory.

Get Some ReST

Now is the time when we start writing some ReST that we want to turn into HTML using Sphinx. Open some file, like first_doc.rst and put some ReST in it. If nothing comes to mind, or you're not familiar with ReST syntax, try the following:

=========================
This Is My First Document
=========================

Yes, this is my first document.  It's lame.  Deal with it.

Save the file (keep in mind that it should be within the source directory if you used the same settings I did). Now it's time to add it to the list of files that Mercurial will pay attention to. While we're at it, let's add the other files that were created by the Sphinx configuration wizard:

$ hg add
adding ../Makefile
adding conf.py
adding first_doc.rst
adding index.rst
$ hg st
A Makefile
A source/conf.py
A source/first_doc.py
A source/index.rst

Don't worry that we don't see all of the directories in the output of hg st--Mercurial tracks files, not directories.

Automate HTML-ization

Here comes the magic in automating the conversion from ReST to HTML: Mercurial hooks. We will use the precommit hook to fire off a command that tells Sphinx to translate our ReST markup into HTML.

Edit your mydox/.hg/hgrc file. If the file does not yet exist, go ahead and create it. Add the following content to it:

[hooks]
precommit.sphinxify = ~/bin/sphinxify_docs.sh

I've opted to call a Bash script instead of using an inline Python call. Now let's create the Bash script, ~/bin/sphinxify_docs.sh:

#!/bin/bash
cd $HOME/mydox
sphinx-build source/ docs/

Notice that I used the $HOME environment variable. This means that I created the mydox directory at /home/myusername/mydox. Adjust that line according to your setup. You'll probably also want to make that script executable:

$ chmod +x ~/bin/sphinxify_docs.sh

Three, Two, One...

You should now be at a stage where you can safely commit changes to your repository and have Sphinx build your HTML documentation. Execute the following command somewhere under your mydox repository:

$ hg ci -m "Initial commit"

If your setup is anything like mine, you should see some output similar to this:

$ hg ci -m "Initial commit"
Making output directory...
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... not found
building [html]: targets for 2 source files that are out of date
updating environment: 2 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... /home/jvanderlinden/mydox/source/first_doc.rst:: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded, 1 warning.
$ hg st
? docs/.buildinfo
? docs/.doctrees/environment.pickle
? docs/.doctrees/first_doc.doctree
? docs/.doctrees/index.doctree
? docs/_sources/first_doc.txt
? docs/_sources/index.txt
? docs/_static/basic.css
? docs/_static/default.css
? docs/_static/doctools.js
? docs/_static/file.png
? docs/_static/jquery.js
? docs/_static/minus.png
? docs/_static/plus.png
? docs/_static/pygments.css
? docs/_static/searchtools.js
? docs/first_doc.html
? docs/genindex.html
? docs/index.html
? docs/objects.inv
? docs/search.html
? docs/searchindex.js

If you see something like that, you're in good shape. Go ahead and take a look at your new mydox/docs/index.html file in the Web browser of your choosing.

Not very exciting, is it? Notice how your first_doc.rst doesn't appear anywhere on that page? That's because we didn't tell Sphinx to put it there. Let's do that now.

Customizing Things

Edit the mydox/source/index.rst file that was created during Sphinx configuration. In the section that starts with .. toctree::, let's tell Sphinx to include everything we ReST-ify:

.. toctree::
   :maxdepth: 2
   :glob:

   *

That should do it. Now, I don't know about you, but I don't really want to include the output HTML, images, CSS, JS, or anything in my documentation repository. It would just take up more space each time we change an .rst file. Let's tell Mercurial to not pay attention to the output HTML--it'll just be static and always up-to-date on our filesystem.

Create a new file called mydox/.hgignore. In this file, put the following content:

syntax: glob
docs/

Save the file, and you should now see something like the following when running hg st:

$ hg st
M source/index.rst
? .hgignore

Let's include the .hgignore file in the list of files that Mercurial will track:

$ hg add .hgignore
$ hg st
M source/index.rst
A .hgignore

Finally, let's commit one more time:

$ hg ci -m "Updating the index to include our .rst files"
Running Sphinx v0.6.4
No builder selected, using default: html
loading pickled environment... done
building [html]: targets for 1 source files that are out of date
updating environment: 0 added, 1 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
writing additional files... genindex search
copying static files... done
dumping search index... done
dumping object inventory... done
build succeeded.

Tada!! The first_doc.rst should now appear on the index page.

Serving Your Documentation

Who seriously wants to have HTML files that are hard to get to? How can we make it easier to access those HTML files? Perhaps we can create a simple static file Web server? That might sound difficult, but it's really not--not when you have access to Python!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler

def main():
    try:
        server = HTTPServer(('', 80), SimpleHTTPRequestHandler)
        server.serve_forever()
    except KeyboardInterrupt:
        server.socket.close()

if __name__ == '__main__':
    main()

I created this simple script and put it in my ~/bin/ directory, also making it executable. Once that's done, you can navigate to your mydox/docs/ directory and run the script. Since I called the script webserver.py, I just do this:

$ cd ~/mydox/docs
$ sudo webserver.py

This makes it possible for you to visit http://localhost/ on your own computer, or to use your computer's IP in place of localhost to access your documentation from a different computer on your network. Pretty slick, if you ask me.

I suppose there's more I could add, but that's all I have time for tonight. Enjoy!

Send E-mails When You Get @replies On Twitter

I just had a buddy of mine ask me to write a script that would send an e-mail to you whenever you get an "@reply" on Twitter. I've recently been doing some work on a Twitter application, so I feel relatively comfortable with the python-twitter project to access Twitter. It didn't take very long to come up with this script, and it appears to work fine for us (using a cronjob to run the script periodically).

I thought others on the Internets might enjoy the script as well, so here it is!

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""
A simple script to check your Twitter account for @replies and send you an email
if it finds any new ones since the last time it checked.  It was developed using
python-twitter 0.5 and Python 2.5.  It has been tested on Linux only, but it
should work fine on other platforms as well.  This script is intended to be
executed by a cron manager or scheduled task manager.

Copyright (c) 2009, Josh VanderLinden
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

- Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.
- Neither the name of the organization nor the names of its contributors may
be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""

import twitter
import ConfigParser
import os
import sys
from datetime import datetime
import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.Utils import formatdate

# get the user's "home" directory
DIRNAME = os.path.expanduser('~')
CONFIG = os.path.join(DIRNAME, '.twitter_email_replies.conf')
FORMAT = '%a %b %d %H:%M:%S +0000 %Y'
REPLY_TEMPLATE = """%(author)s said: %(text)s
Posted on %(created_at)s
Go to http://twitter.com/home?status=@%(screen_name)s%%20&in_reply_to_status_id=%(id)s&in_reply_to=%(screen_name)s to post a reply
"""

# sections
AUTH = 'credentials'
EXEC = 'exec_info'
EMAIL = 'email_info'

# make the code a bit "cleaner"
O = lambda s: sys.stdout.write(s + '\n')
E = lambda s: sys.stderr.write(s + '\n')
str2dt = lambda s: datetime.strptime(s, FORMAT)

def get_dict(status):
    my_dict = status.AsDict()
    my_dict['screen_name'] = my_dict['user']['screen_name']
    my_dict['author'] = my_dict['user']['name']
    return my_dict

def main():
    O('Reading configuration from %s' % CONFIG)
    parser = ConfigParser.SafeConfigParser()
    config = parser.read(CONFIG)

    # make sure we have the proper sections
    if not parser.has_section(AUTH): parser.add_section(AUTH)
    if not parser.has_section(EMAIL): parser.add_section(EMAIL)
    if not parser.has_section(EXEC): parser.add_section(EXEC)

    try:
        # get some useful settings from the configuration file
        username = parser.get(AUTH, 'username')
        password = parser.get(AUTH, 'password')

        to_address = parser.get(EMAIL, 'to_address')
        from_address = parser.get(EMAIL, 'from_address')
        smtp_server = parser.get(EMAIL, 'smtp_server')
        smtp_user = parser.get(EMAIL, 'smtp_user')
        smtp_pass = parser.get(EMAIL, 'smtp_pass')

        if '' in [username, password, to_address, from_address, smtp_server]:
            raise Exception('Not configured')
    except Exception:
        E('Please configure your credentials and e-mail information in %s!' % CONFIG)

        # create some placeholders in the configuration file to make it easier
        sections = {
            AUTH: ('username', 'password'),
            EMAIL: ('to_address', 'from_address', 'smtp_server', 'smtp_user', 'smtp_pass')
        }

        for section in sections.keys():
            for opt in sections[section]:
                if not parser.has_option(section, opt):
                    parser.set(section, opt, '')
    else:
        # determine the last time we checked for replies
        try:
            last_check = str2dt(parser.get(EXEC, 'last_run'))
        except ConfigParser.NoOptionError:
            last_check = datetime.utcnow()
        last_check_str = last_check.strftime(FORMAT)

        info = 'Fetching updates for %s since %s' % (username,
                                                       last_check_str)
        O(info)

        # attempt to connect to Twitter
        api = twitter.Api(username=username, password=password)

        # not using the `since` parameter for more backward-compatibility
        timeline = api.GetReplies()
        new_replies = []
        for reply in timeline:
            post_time = str2dt(reply.GetCreatedAt())
            if post_time > last_check:
                new_replies.append(reply)

        count = len(new_replies)
        if count:
            # send out an email for this user
            O('Found %i new replies... sending e-mail to %s' % (count, to_address))
            reply_list = '\n\n'.join([REPLY_TEMPLATE % get_dict(r) for r in new_replies])
            is_are = 'is'
            plural = 'y'
            if count != 1:
                is_are = 'are'
                plural = 'ies'

            params = {
                'is_are': is_are,
                'count': count,
                'replies': plural,
                'username': username,
                'reply_list': reply_list,
                'last_check': last_check_str
            }

            text = """There %(is_are)s %(count)i new @repl%(replies)s for %(username)s on Twitter since %(last_check)s:

%(reply_list)s""" % params

            # compose the e-mail
            msg = MIMEMultipart()
            msg['From'] = from_address
            msg['To'] = to_address
            msg['Date'] = formatdate(localtime=True)
            msg['Subject'] = 'New @Replies for %s' % username
            msg.attach(MIMEText(text))

            # try to send the e-mail message out
            email = smtplib.SMTP(smtp_server)
            if smtp_user and smtp_pass:
                email.login(smtp_user, smtp_pass)
            email.sendmail(from_address,
                           to_address,
                           msg.as_string())
            email.close()

        # save the current time so we know where to pick up next time
        parser.set(EXEC, 'last_run', datetime.utcnow().strftime(FORMAT))

    # write the config
    O('Saving settings...')
    out = open(CONFIG, 'wb')
    parser.write(out)
    out.close()

if __name__ == '__main__':
    main()

Feel free to copy this script and modify it to your desires. Also, please comment if you have issues using it.

Pluggable Django Apps and Django's Admin

There are a lot of us out there who build these reusable, pluggable Django applications and share them with the world. Some of the applications are more useful than others. Personally, I think my applications tend to fall in the less useful category, but I like to build and release them nonetheless.

The Background

The other day, I received a suggestion from one of the users of django-pendulum for how to make it a little more user-friendly. The problem was that he had not yet configured Pendulum for his current Site. As soon as he tried to access his site after installing django-pendulum, he was greeted with a nasty Django exception page, admonishing him to configure Pendulum for the site. It was pretty dirty and effective, but not very pleasant at all.

This user suggested that, instead of displaying the nasty exception page, I redirect the user to a place where they could in fact configure Pendulum. I thought this was a grand idea, and I set out to make it happen last night.

The Dilemma

That's when I realized I had a problem. django-pendulum is intended to be a reusable Django application. Most of the time it might be safe to assume that everyone uses /admin/ to get to the Django administration utility. However, there's also a good chance that this will not be the prefix used to access the administration utility. What do you do when you hardcode your applications to redirect a user to /admin/ and they use something more secure like /milwaukee/ for their administration utility?

So the question is this: how do you determine what URL will bring a user to the Django administration page for a particular Django model based on the specific site's URLconf setup?

The Solution

It turns out that one solution (not sure if it's safe to say "the solution" necessarily) with Django 1.1+ is to have something like this:

from django.core.urlresolvers import reverse
admin_url = reverse('admin_pendulum_pendulumconfiguration_add')

The named URLconf used in the reverse function should ensure that the appropriate administration URL prefix is used for whatever site your application is installed on.

I did a bit of browsing through the Django documentation, but I've never seen it stated anywhere what the names are for administration URLs actually are. I found out about this little nugget by examining the output of the django.contrib.admin.ModelAdmin.get_urls method (see get_urls(self)). From the looks of it, pretty much any Django model registered with your Django administration will have URLconf items similar to the following:

[<RegexURLPattern admin_pendulum_pendulumconfiguration_changelist ^$>,
 <RegexURLPattern admin_pendulum_pendulumconfiguration_add ^add/$>,
 <RegexURLPattern admin_pendulum_pendulumconfiguration_history ^(.+)/history/$>,
 <RegexURLPattern admin_pendulum_pendulumconfiguration_delete ^(.+)/delete/$>,
 <RegexURLPattern admin_pendulum_pendulumconfiguration_change ^(.+)/$>]

In other words, you get 5 URLconf patterns per model: changelist, add, history, delete, and change. These patterns are named, and they all appear to be prefixed with admin_[app_label]_[model]_. In the case of django-pendulum, the application is called pendulum so [app_label] becomes pendulum. The model in question is called PendulumConfiguration so the [model] becomes pendulumconfiguration. I assume this same pattern will be followed for any registered model. Please correct me if I'm wrong.

More Details

For those of you interested in more details, django-pendulum uses a middleware class to redirect users to the screen where they can configure Pendulum for the site in question (unless it's already been configured). Here's my middleware as of revision 18:

from django.contrib.auth.views import login
from django.contrib.sites.models import Site
from django.core.urlresolvers import reverse
from django.http import HttpResponseRedirect
from django.conf import settings
from pendulum.models import PendulumConfiguration

SITE = Site.objects.get_current()
admin_url = reverse('admin_pendulum_pendulumconfiguration_add')
login_url = getattr(settings, 'LOGIN_URL', '/accounts/login/')

class PendulumMiddleware:
    """
    This middleware ensures that anyone trying to access Pendulum must be
    logged in.  If Pendulum hasn't been configured for the current site, the
    staff users will be redirected to the page to configure it.
    """
    def process_request(self, request):
        try:
            SITE.pendulumconfiguration
        except PendulumConfiguration.DoesNotExist:
            # this will force the user to configure pendulum if they're staff
            if request.user.has_perm('add_pendulumconfiguration') and \
                request.path not in (admin_url, login_url):
                # leave the user a message
                request.user.message_set.create(message='Please configure Pendulum for %s' % SITE)

                return HttpResponseRedirect(admin_url)
        else:
            entry_url = reverse('pendulum-entries')

            if request.path[:len(entry_url)] == entry_url and request.user.is_anonymous():
                if request.POST:
                    return login(request)
                else:
                    return HttpResponseRedirect('%s?next=%s' % (login_url, request.path))

This checks to see if Pendulum has been configured for the site. If not, the middleware will check to see if the current user has permission to add a configuration for Pendulum. If so, they will be redirected to the appropriate configuration screen.

If Pendulum has been configured for the site or the current user does not have permission to add a configuration, the user should be unaffected. If the user is trying to access Pendulum on the site's front-end, the middleware will ensure that they're actually logged in.

As always, suggestions are welcome!

Syntax Highlighting, ReST, Pygments, and Django

Some of you regulars out there may have noticed an interesting change in the presentation of some of my articles: source code highlighting. I've been interested in doing this for quite some time, I just never really got around to implementing it until last night.

I found this implementation process to be a bit more complicatd than I had anticipated. For my own benefit as well as for anyone else who wants to do the same thing, I thought I'd document my findings in a thorough article for how to add syntax highlighting to an existing Django- and reStructuredText-powered Web site.

The power behind the syntax highlighting is:

Python is a huge player in this feature because reStructuredText (ReST) was built for Python, Pygments is the source highlighter (written in Python), and Django is written in Python (and my site is powered by Django). Some of you may recall that I converted all of my articles to ReST not too long ago because it suited my needs better than Textile, my previous markup processor. At the time, I was not aware that the conversion to ReST would make it all the easier for me to implement the syntax highlighting, but last night I figured out that that conversion probably saved me a lot of frustration. Cascading Stylesheets (CSS) are responsible for making the source code actually look good, while Pygments takes care of assigning classes to various parts of the designated source code and generating the CSS.

So, the first set of requirements, which I will not document in this article, are that you already have a Django site up and running and that you're familiar with ReST syntax. If you have the django.contrib.flatpages application installed already, you can type up some ReST documents there and apply the concepts discussed in this article.

Next, you should ensure that you have Pygments installed. There are a variety of ways to install this. Perhaps the easiest and most platform-independent method is to use easy_install:

$ easy_install pygments

This command should work essentially the same on Windows, Linux, and Macintosh computers. If you don't have it installed, you can get it from its website. If you're using a Debian-based distribution of Linux, such as Ubuntu, you could do something like this:

$ sudo apt-get install python-pygments

...and it should take care of downloading and installing Pygments. Alternatively, you can download it straight from the PyPI page and install it manually.

Now we need to install the Pygments ReST directive. A ReST directive is basically like a special command to the ReST processor. I think this part was the most difficult aspect of the implementation, simply because I didn't know where to find the Pygments directive or how to write my own. Eventually, I ended up downloading the Pygments-1.0.tar.gz file from PyPI, opening the Pygments-1.0/external/rst-directive.py file from the archive, and copying the stuff in there into a new file within my site.

For my own purposes, I made some small adjustments to the directive over what come with the Pygments distribution. I think it would save us all a lot of hassle if I just copied and pasted the directive, as I currently have it, so you can see it first-hand.

"""
    The Pygments reStructuredText directive
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    This fragment is a Docutils_ 0.4 directive that renders source code
    (to HTML only, currently) via Pygments.

    To use it, adjust the options below and copy the code into a module
    that you import on initialization.  The code then automatically
    registers a ``code-block`` directive that you can use instead of
    normal code blocks like this::

    .. code:: python

            My code goes here.

    If you want to have different code styles, e.g. one with line numbers
    and one without, add formatters with their names in the VARIANTS dict
    below.  You can invoke them instead of the DEFAULT one by using a
    directive option::

    .. code:: python
       :number-lines:

            My code goes here.

    Look at the `directive documentation`_ to get all the gory details.

    .. _Docutils: http://docutils.sf.net/
    .. _directive documentation:
       http://docutils.sourceforge.net/docs/howto/rst-directives.html

    :copyright: 2007 by Georg Brandl.
    :license: BSD, see LICENSE for more details.
"""

# Options
# ~~~~~~~

# Set to True if you want inline CSS styles instead of classes
INLINESTYLES = False

from pygments.formatters import HtmlFormatter

# The default formatter
DEFAULT = HtmlFormatter(noclasses=INLINESTYLES)

# Add name -> formatter pairs for every variant you want to use
VARIANTS = {
    'linenos': HtmlFormatter(noclasses=INLINESTYLES, linenos=True),
}


from docutils import nodes
from docutils.parsers.rst import directives

from pygments import highlight
from pygments.lexers import get_lexer_by_name, TextLexer

def pygments_directive(name, arguments, options, content, lineno,
                       content_offset, block_text, state, state_machine):
    try:
        lexer = get_lexer_by_name(arguments[0])
    except ValueError:
        # no lexer found - use the text one instead of an exception
        lexer = TextLexer()
    # take an arbitrary option if more than one is given
    formatter = options and VARIANTS[options.keys()[0]] or DEFAULT
    parsed = highlight(u'\n'.join(content), lexer, formatter)
    parsed = '<div class="codeblock">%s</div>' % parsed
    return [nodes.raw('', parsed, format='html')]

pygments_directive.arguments = (1, 0, 1)
pygments_directive.content = 1
pygments_directive.options = dict([(key, directives.flag) for key in VARIANTS])

directives.register_directive('code-block', pygments_directive)

I won't explain what that code means, because, quite frankly, I'm still a little hazy on the inner workings of ReST directives myself. Suffice it to say that this snippet allows you to easily highlight blocks of code on ReST-powered pages.

The question now is: where do I put this snippet? As far as I'm aware, this code can be located anywhere so long as it is loaded at one point or another before you start your ReST processing. For the sake of simplicity, I just stuffed it in the __init__.py file of my Django site. This is the __init__.py file that lives in the same directory as manage.py and settings.py. Putting it in that file just makes sure it's loaded each time you start your Django site.

To make Pygments highlight a block of code, all you need to do is something like this:

.. code:: python

    print 'Hello world!'

...which would look like...

print 'Hello world!'

If you have a longer block of code and would like line numbers, use the :number-lines: option:

.. code:: python
    :number-lines:

    for i in range(100):
        print i

...which should look like this...

for i in range(100):
    print i

That's all fine and dandy, but it probably doesn't look like the code is highlighted at all just yet (on your site, not mine). It's just been marked up by Pygments to have some pretty CSS styles applied to it. But how do you know which styles mean what?

Luckily enough, Pygments takes care of generating the CSS files for you as well. There are several attractive styles that come with Pygments. I would recommend going to the Pygments demo to see which one suits you best. You can also roll your own styles, but I haven't braved that yet so I'll leave that for another day.

Once you choose a style (I chose native for Code Koala), you can run the following commands:

$ pygmentize -S native -f html > native.css
$ cp native.css /path/to/site/media/css

(obviously, you'd want to replace native with the name of the style you like the most) Finally, add a line to your HTML templates to load the newly created CSS file. In my case, it's something like this:

<link rel="stylesheet" type="text/css" href="/static/styles/native.css" />

Now you should be able to see nicely-formatted source code on your Web pages (assuming you've already got ReST processing your content).

If you haven't been using ReST to generate nicely-formatted pages, you should make sure a couple of things are in place. First, you must have the django.contrib.markup application installed. Second, your templates should be setup to process ReST markup into HTML. Here's a sample templates/flatpages/default.html:

{% extends 'base.html' %}
{% load markup %}

{% block title %}{{ flatpage.title }}{% endblock %}

{% block content %}
<h2>{{ flatpage.title }}</h2>

{{ flatpage.content|restructuredtext }}
{% endblock %}

So that short template should allow you to use ReST markup for your flatpages, and it should also take care of the magic behind the .. code:: python directive.

I should also note that Pygments can handle a TON of languages. Check out the Pygments demo for a list of languages it knows how to highlight.

I think that about does it. Hopefully this article will help some other poor chap who is currently in the same situation as I was last night, and hopefully it will save you a lot more time than it took me to figure out all this junk. If it looks like I've missed something, or maybe that something needs further clarification, please comment and I'll see what I can do.

Sad Day for Religious Nerds

I recently decided it would be prudent for me to request permission from the Church of Jesus Christ of Latter-day Saints to redistribute a copy of the scriptures with the PyScriptures program that I have been working on recently. I didn't want to be held liable for any copyright infringement or other legal violations. So I decided to contact them a couple of weeks ago. Here is my message requesting permission:

To whom it may concern:

I would like to request permission to redistribute and use a database which contains the LDS standard works (scriptures only, not the bible dictionary, topical guide, or even footnotes). One of my personal projects relies upon a local copy of the scriptures in order to be useful at all, and I would like people to be able to use this program without fear of any sort of copyright infringement charges to any party. The goal of the program is to offer users of platforms other than Microsoft Windows a way to peruse and study the scriptures without requiring an Internet connection. I have tested it on Windows XP, Linux, and MacOS X and it works great. There are still some parts that need some work, and things could probably be optimized a bit, but works well enough considering that I've only been working on it for about a week and a half.

To learn more about this project, please go to http://code.google.com/p/pyscriptures/ where you can even browse the Python source code for my application (though you might want to do so on an empty stomach). The database is called 'scriptures.db' (http://code.google.com/p/pyscriptures/source/browse/trunk/pyscriptures/scriptures.db) and was created using SQLite in case you are interested in examining and verifying its contents. One of my main reasons for creating this database is that I noticed the database that The Mormon Documentation Project offers is missing some things, particularly all of the answers in Section 77 of the Doctrine and Covenants. I didn't want to find out what else was missing or modified. Perhaps the reason those answers are missing is because that's part of the conditions of redistributing the scriptures?

Please notify me as soon as possible if there are any objections to the use and redistribution of this database. If that is the case, I will delete this project immediately. However, if I may be granted permission to redistribute the database containing the unmodified scriptures, I would greatly appreciate a response which states what my rights are in this matter.

Thank you for your time! Have a wonderful day.

Sincerely,

Josh VanderLinden

Developer

I got my reply today:

Dear Josh VanderLinden:

Thank you for your email dated 9 June 2008 requesting permission to use a database containing the LDS Standard works to redistribute.

After reviewing your request we have determined that we must deny your request for the use of this material. Church policy does not allow its intellectual properties to be used as you are requesting.

We appreciate your intentions, but trust you will understand and support the decision of the Church in this matter.

Thank you for checking with us.

Bobbie Reynolds, Paralegal

Intellectual Property Office

Somehow I could see this coming. I was curious, however, to determine if there was a way I could avoid litigation by modifying the way the program operates. So I responded with this message:

Hello Bobbie,

Thank you for your reply. I'm sorry that the verdict was not in my favor, but I do understand completely. Otherwise, I wouldn't have asked in the first place. Before I remove the project from the Internet, I would like to ask one more question. Is there a way I could somehow make the program legal? I've noticed that other some programs for the scriptures don't exactly redistribute the scriptures with the program itself. Instead they seem to download the scriptures from scriptures.lds.org (or elsewhere?) on an individual level, and the individual who runs the program has to specifically tell the program to do so. Would that be a more appropriate route for my project to take? Or would that also violate the rules?

Thanks again,

Josh VanderLinden

After only a few hours, I received the following reply:

If I understand you correctly, this also sounds like a reformatting project, which is restricted. Thank you for requesting further clarification and for your compliance to Church policy.

Bobbie

So it looks like I will have to discontinue this project, or at least not allow other people to use my work and keep it all to myself. In compliance with the message communicated to me by the legal representatives of the Church, I have removed the project from Google Code, and it shall remain a project for my own purposes. My apologies go out to everyone who actually found this program useful and promising. I can't blame the Church for being strict in its policies--neither should you.