Django-Tracking 0.3.5

I've finally gotten around to looking at a bunch of tickets that had been opened for django-tracking in the past year and a half or so. I feel horrible that it's really taken that long for me to get to them! Every time I got a ticket notification, I told myself, "Okay, I'll work on that this weekend." Many have weekends have passed without any work on any of my projects. I'm going to get better about that!

Anyway, several fixes have gone into the latest version of django-tracking. Some have to do with unicode problems (thanks ramusus!). Others have to do with overall performance, while yet others have to do with overall stability.

The first interesting change in this release is that django-tracking no longer relies on the GeoIP Python API. Instead it's now using django.contrib.gis.utils.GeoIP. I had hoped that this would remove the dependency on the GeoIP C API, but it appears that I was mistaken. Oh well.

Perhaps the biggest improvement in this new release is the use of caching. With caching in place, the middleware classes don't slam the database nearly as badly as they used to. There's still more that could be done with caching to improve performance, but I think what I've got now will be a big help.

Another noteworthy change, in my opinion, is the use of logging. I've sprinkled mildly useful logging messages throughout the code so you can learn when something bad happens that is silently handled. I hope that this will help me improve the quality of the code as it will allow anyone who uses the project (and pays attention to the log messages, of course) to tell me when bad things are happening.

Finally, the packaging code has been updated to be much more simple. Version 0.3.5 has been uploaded to PyPI and is available via pip or easy_install. If you prefer to have the latest copy of the code, the official code repositories are (in order of my personal preference):

I can't wait for your feedback!

More django-articles Updates

I've spent a little more time lately adding new features to django-articles. There are two major additions in the latest release (2.0.0-pre2).

  • Article attachments
  • Article statuses

That's right folks! You can finally attach files to your articles. This includes attachments to emails that you send, if you have the articles from email feature properly configured. To prove it, I'm going to attach a file to this article (which I'm posting via email).

Next, I've decided that it's worth allowing the user to specify different statuses for their articles. One of the neat things about this feature is that if you are a super user, you're logged in, and you save an article with a status that is designated as "non-live", you will still be able to see it on the site. This is a way for users to preview their work before making it live. Out of the box, there are only two statuses: draft and finished. You're free to add more statuses if you feel so inclined (they're in the database, not hardcoded).

The article status is still separate from the "is_active" flag when saving an article. Any article that is marked as inactive will not appear on the site regardless of the article's "status".

On a slightly less impressive note (although still important), this release includes some basic unit tests. Most of the tests currently revolve around article statuses and making sure that the appropriate articles appear on the site.

Site-Wide Caching in Django

My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

  • django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
  • django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

  • CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
  • CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.

Syndication Caching in Django

I've recently been working on some performance enhancements on my site. Apparently some of my latest articles are a little too popular for my shared hosting plan. The surge of traffic to my site took down several sites on the same server as my own.

My response to the fiasco was to, among other things, implement caching on my site. It seems like the caching has helped a lot. I've noticed that my RSS feeds are hit almost as hard as real articles on my site, and I noticed that they weren't being cached the way I had expected. I tried a couple of things that I thought would work, but nothing seemed to do the trick.

After doing some brief research into the idea of caching my RSS feeds using Django's built-in caching mechanisms, I came up empty. It occurred to me to implement caching in the feed classes themselves. I tried something like this:

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = Article.objects.active().order_by('-publish_date')[:10]
            cache.set('latest_articles', articles)

        return articles

    ...

This code doesn't work! When I would try to retrieve one of my RSS feeds with such "caching" in place, I got the following traceback:

Traceback (most recent call last):

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 280, in run
    self.result = application(self.environ, self.start_response)

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 674, in __call__
    return self.application(environ, start_response)

  File "/home/wheaties/dev/django/core/handlers/wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 143, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 101, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 36, in __call__
    return self.decorator(self.func)(*args, **kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 86, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 215, in feed
    feedgen = f(slug, request).get_feed(param)

  File "/home/wheaties/dev/django/contrib/syndication/feeds.py", line 37, in get_feed
    return super(Feed, self).get_feed(obj, self.request)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 134, in get_feed
    for item in self.__get_dynamic_attr('items', obj):

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 69, in __get_dynamic_attr
    return attr()

  File "/home/wheaties/dev/articles/feeds.py", line 22, in items
    cache.set(key, articles)

  File "/home/wheaties/dev/django/core/cache/backends/filebased.py", line 72, in set
    pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)

PicklingError: Can't pickle <class 'django.utils.functional.__proxy__'>: attribute lookup django.utils.functional.__proxy__ failed

This error took me by surprise. I didn't expect anything like this. I tried a few things to get around it, but then I actually stopped to consider what was happening to cause such an error. My Article objects are definitely serializable, which is why the error didn't make sense.

Then it hit me: the object I was actually attempting to cache was a QuerySet, not a list or tuple of Article objects. Changing the code to wrap the Article.objects.active() call with list().

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = list(Article.objects.active().order_by('-publish_date')[:10])
            cache.set('latest_articles', articles)

        return articles

    ...

And that one worked. I would prefer to cache the actual XML version of the RSS feed, but I will settle with a few hundred fewer hits to my database each day by caching the list of articles. If anyone has better suggestions, I'd love to hear about them. Until then, I hope my experience will help others out there who are in danger of taking down other sites on their shared hosting service!

Review: Django 1.0 Web Site Development

Introduction

Several months ago, a UK-based book publisher, Packt Publishing contacted me to ask if I would be willing to review one of their books about Django. I gladly jumped at the opportunity, and I received a copy of the book a couple of weeks later in the mail. This happened at the beginning of September 2009. It just so happened that I was in the process of being hired on by ScienceLogic right when all of this took place. The subsequent weeks were filled to the brim with visitors, packing, moving, finding an apartment, and commuting to my new job. It was pretty stressful.

Things are finally settling down, so I've taken the time to actually review the book I was asked to review. I should mention right off the bat that this is indeed a solicited review, but I am in no way influenced to write a good or bad review. Packt Publishing simply wants me to offer an honest review of the book, and that is what I indend to do. While reviewing the book, I decided to follow along and write the code the book introduced. I made sure that I was using the official Django 1.0 release instead of using trunk like I tend to do for my own projects.

The title of the book is Django 1.0 Web Site Development, written by Ayman Hourieh, and it's only 250 pages long. Ayman described the audience of the book as such:

This book is for web developers who want to learn how to build a complete site with Web 2.0 features, using the power of a proven and popular development system--Django--but do not necessarily want to learn how a complete framework functions in order to do this. Basic knowledge of Python development is required for this book, but no knowledge of Django is expected.

Ayman introduced Django piece by piece using the end goal of a social bookmarking site, a la del.icio.us and reddit. In the first chapter of the book, Ayman discussed the history of Django and why Python and Django are a good platform upon which to build Web applications. The second chapter offers a brief guide to installing Python and Django, and getting your first project setup. Not much to comment on here.

Digging In

Chapter three is where the reader was introduced to the basic structure of a Django project, and the initial data models were described. Chapter four discussed user registration and management. We made it possible for users to create accounts, log into them, and log out again. As part of those additions, the django.forms framework was introduced.

In chapter five, we made it possible for bookmarks to be tagged. Along with that, we built a tag cloud, restricted access to certain pages, and added a little protection against malicious data input. Next up was the section where things actually started getting interesting for me: enhancing the interface with fancy effects and AJAX. The fancy effects include live searching for bookmarks, being able to edit a bookmark in place (without loading a new page), and auto-completing tags when you submit a bookmark.

This chapter really reminded me just how simple it is to add new, useful features to existing code using Django and Python. I was thoroughly impressed at how easy it was to add the AJAX functionality mentioned above. Auto-completing the tags as you type, while jQuery and friends did most of the work, was very easy to implement. It made me happy.

Chapter seven introduced some code that allowed users to share their bookmarks with others. Along with this, the ability to vote on shared bookmarks was added. Another feature that was added in this chapter was the ability for users to comment on various bookmarks.

The ridiculously amazing Django Administration utility was first introduced in chapter eight. It kinda surprised me that it took 150 pages before this feature was brought to the user's attention. In my opinion, this is one of the most useful selling points when one is considering a Web framework for a project. When I first encountered Django, the admin interface was one of maybe three deciding factors in our company's decision to become a full-on Django shop.

Bring on the Web 2.0

Anyway, in chapter nine, we added a handful of useful "Web 2.0" features. RSS feeds were introduced. We learned about pagination to enhance usability and performance. We also improved the search engine in our project. At this stage, the magical Q objects were mentioned. The power behind the Q objects was discussed very well, in my opinion.

In chapter 10, we were taught how we can create relationships between members on the site. We made it possible for users to become "friends" so they can see the latest bookmarks posted by their friends. We also added an option for users to be able to invite some of their other friends to join the site via email, complete with activation links. Finally, we improved the user interface by providing a little bit of feedback to the user at various points using the messages framework that is part of the django.contrib.auth package in Django 1.0.

More advanced topics, such as internationalization and caching, were discussed in chapter 11. Django's special unit testing features were also introduced in chapter 11. This section actually kinda frustrated me. Caching was discussed immediately before unit testing. In the caching section, we learned how to enable site-wide caching. This actually broke the unit tests. They failed because the caching system was "read only" while running the tests. Anyway, it's probably more or less a moot point.

Chapter 11 also briefly introduced things to pay attention to when you deploy your Django projects into a production environment. This portion was mildly disappointing, but I don't know what else would have made it better. There are so many functional ways to deploy Django projects that you could write books just to describe the minutia involved in deployment.

The twelfth and final chapter discussed some of the other things that Django has to offer, such as enhanced functionality in templates using custom template tags and filters and model managers. Generic views were mentioned, and some of the other useful things in django.contrib were brought up. Ayman also offered a few ideas of additional functionality that the reader can implement on their own, using the things they learned throughout the book.

Afterthoughts

Overall, I felt that this book did a great job of introducing the power that lies in using Django as your framework of choice. I thought Ayman managed to break things up into logical sections, and that the iterations used to enhance existing functionality (from earlier chapters) were superbly executed. I think that this book, while it does assume some prior Python knowledge, would be a fine choice for those who are curious to dig into Django quickly and easily.

Some of the beefs I have with this book deal mostly with the editing. There were a lot of strange things that I found while reading through the book. However, the biggest sticking point for me has to do with "pluggable" applications. Earlier I mentioned that the built-in Django admin was one of only a few deciding factors in my company's choice to become a Django shop. Django was designed to allow its applications to be very "pluggable."

You may be asking, "What do I mean by 'pluggable'?" Well, say you decide to build a website that includes a blog, so you build a Django project and create an application specific to blogging. Then, at some later time, you need to build another site that also has blog functionality. Do you want to rewrite all of the blogging code for the second site? Or do you want to use the same code that you used in the first site (without copying it)? If you're anything like me and thousands of other developers out there, you would probably rather leverage the work you had already done. Django allows you to do this if you build your Django applications properly.

This book, however, makes no such effort to teach the reader how to turn all of their hard work on the social bookmarking features into something they could reuse over and over with minimal effort in the future. Application-specific templates are placed directly into the global templates directory. Application-specific URLconfs are placed in the root urls.py file. I would have liked to see at least some effort to make the bookmarking application have the potential to be reused.

Finally, the most obvious gripe is that the book is outdated. That's understandable, though! Anything in print media will likely be outdated the second it is printed if the book has anything to do with computers. However, with the understanding that this book was written specifically for Django 1.0 and not Django 1.1 or 1.2 alpha, it does an excellent job at hitting the mark.

Bash Time Saver

The other day I was helping a friend get their Django site back online, and I found myself doing a couple of very similar commands when backing up the databases:

$ pg_dump -U some_database -f some_database_backup.sql

I use Bash as my shell in Linux, and apparently it's capable of doing some neat string substitution. I wanted to try something I had seen a few days prior to save myself a few keystrokes. Here's what I was trying to do:

$ pg_dump -U some_database -f some_database_backup.sql
$ ^some_database^some_other_database

Which, I had hoped would replace all instances of some_database with some_other_database from the previously executed command. It turns out that the ^search-for^replace-with is only good for one substitution. That means that my sneaky attempt at saving keystrokes just overwrote the first database's backup with the backup of the second database. In other words, I was getting this:

$ pg_dump -U some_database -f some_database_backup.sql
$ pg_dump -U some_other_database -f some_database_backup.sql

...when I wanted this...

$ pg_dump -U some_database -f some_database_backup.sql
$ pg_dump -U some_other_database -f some_other_database_backup.sql

I did a little more digging today, and it turns out that I should have been using the following syntax:

$ pg_dump -U some_database -f some_database_backup.sql
$ !!:gs/some_database/some_other_database/

The !! is shorthand for executing the previous command, and the :gs is used for global string replacement. And there you have it!

Out of curiosity, does anyone know of similar functionality in other shells?

A Quick Django Tip: User Profiles

I thought I would share with all of you a little trick that I've been using for quite some time in my Django applications. Personally, I find it to be very convenient and simple.

Django allows you to specify an AUTH_PROFILE_MODULE setting if you wish to maintain information about a user beyond the basic username, password, email, etc. To access the profile for a given User instance, you must do something like:

from django.contrib.auth.models import User
user = User.objects.get(pk=1)
user.get_profile().additional_info_field

That seems all find and dandy, right? Just a simple call to get_profile() isn't that difficult. However, if there is not yet an instance of whatever you set AUTH_PROFILE_MODULE to for the user in question, you'll get an error about it when you call get_profile().

My simple-minded way around this is to do something like this:

from django.db import models
from django.contrib.auth.models import User

class UserProfile(models.Model):
    user = models.ForeignKey(User, unique=True)
    additional_info_field = models.CharField(max_length=50)

User.profile = property(lambda u: UserProfile.objects.get_or_create(user=u)[0])

The magic is in the property() and get_or_create. Using the property() feature in Python, means you can just do something like:

from django.contrib.auth.models import User
user = User.objects.get(pk=1)
user.profile.additional_info_field

(with no parentheses after profile) The get_or_create method tells Django to look for any UserProfile objects whose user attribute is the user from which you are accessing the profile property. If no matches are found, an instance of UserProfile is created for you. The lambda function returns the UserProfile instance in both cases.

This trick is very simple. It's also very effective in my experience. I'm sure there are other ways of doing the same thing, but this works for me, and it's just one line of code--no need to even specify the AUTH_PROFILE_MODULE setting! You can apply the same trick to pretty much anything if you'd like. It doesn't have to be just for user profiles. Enjoy!

Announcing django-ittybitty 0.1.0-pre2

I'd like to take this opportunity to officially announce my latest little side project: django-ittybitty! Some of you out there might not find this to be a useful application, but I hope others will enjoy it.

Many of you are familiar with the URL-shortening sites like http://tinyurl.com/, http://is.gd/, http://cli.gs/, and whole slew of others. These sites are all fine and dandy, right? Wrong! What happens when those sites have downtime and potential visitors to your site never get to your site because the URL-shortening site is down? You lose traffic. That's not good, in case you were unsure about it.

That is why I made this application. It allows you to have short URLs for any and every page on your Django site. No more need to rely on 3rd party servers to translate short URLs to real URLs on your site. So long as your pony-powered site is up and running, your visitors will be able to use URLs generated by this application to get anywhere on your site. All you need to do to make this work is download and install the application, add a middleware class to your MIDDLEWARE_CLASSES, and then use a simple template tag to generate a short URL for any given page.

django-ittybitty will keep track of the number of times a particular "itty bitty URL" has been used to access your site. I suppose some people will find that useful, but it's hardly a true metric for your "most popular" pages.

The algorithm behind this application is very simple, but it can potentially handle around 18,446,744,073,709,551,615 shortened URLs in 64 characters or fewer, neglecting the 'http://www.....' for your site (good luck getting your database to play well with that many records, much less storing them on a server :)).

For more information, please check out the project pages and enjoy:

For those who are interested, here are some code samples for how to use django-ittybitty:

{% extends 'base.html' %}
{% load ittybitty_tags %}

{% block content %}
<a href="{% ittybitty_url %}">Link to this page!</a>
{% endblock %}

or:

{% extends 'base.html' %}
{% load ittybitty_tags %}

{% block content %}
{% ittybitty_url as ittybitty %}
<a href="{{ ittybitty.get_shortcut }}">Link to this page!</a>
{% endblock %}

or:

{% extends 'base.html' %}
{% load ittybitty_tags %}

{% block content %}
{% ittybitty_url as ittybitty %}
{% with ittybitty.get_shortcut as short_url %}
<a href="{{ short_url }}">Link to this page!</a>
<a href="{{ short_url }}">Link to this page again!</a>
<a href="{{ short_url }}">Link to this page one more time!</a>
{% endwith %}
{% endblock %}

Enjoy!

Downtime and django-tracking 0.2.7

The Foul Side

Some of you may have noticed the ~11 hours of intermittent downtime that codekoala.com experienced from early on the 24th of January to just a little while ago. I was doing some work on my django-tracking application, which somehow seemed to break my site. CodeKoala.com uses PostgreSQL as the database backend, and as soon as I tried to apply the changes to django-tracking to my site, everything just seemed to die.

The weird thing was that the site would work if I put it on a sqlite or MySQL backend. I didn't change the database schema at all as part of my changes to django-tracking, so it made absolutely no sense. I was in touch with WebFaction's awesome support squad for a good deal of today trying to get things sorted out. We tried just about everything we could think of, short of porting the entire site to a different backend or restoring a recent backup.

Just as things were looking very grim, I tried this command: ./manage.py reset tracking. Voilà! The site started working again. I guess I just had some super funky junk in my tracking application's tables.

On the Brighter Side

As a result of all this work and toil, you all can now enjoy django-tracking 0.2.7! There were a lot of minor code optimizations that went into this release. The biggest change, however, is the fancy "active users map" that you see here.

This feature allows you to display a map of where your recently active users are likely to be based upon their IP address. A list is also available below the map with displays further information about each active visitor. The page updates itself every 5 seconds or so, which means that if a visitor hasn't been active for 10 minutes (or whatever your timeout happens to be), their marker will disappear from the map and their entry in the last will go away too! Pretty dang fancy if you ask me!

If you're interested in downloading and using django-tracking, please check out the links at the end of the article. The Google Code link explains what you need to do and how to configure things.

So folks!! Please play with it!

Django + Aggregation = w00t

One of my good friends has just notified me of some great news in Django land: aggregates are upon us! As of revision 9742, Django includes two new operations: annotate() and aggregate().

Ok, ok, so I haven't actually played with Django's new aggregation functionality yet, but I definitely will!! And I'm sure it will rock.

More information: