New Feature in django-articles: Articles From Email

One of the features that I really like about sites like posterous and tumblr is that they allow you to send email to a special email address and have it be posted as a blog article. This is a feature I've been planning to implement in django-articles pretty much since its inception way back when. I finally got around to working on it.

The latest release of django-articles allows you to configure a mailbox, either IMAP4 or POP3, to periodically check for new emails. A new management command check_for_articles_from_email can be used to process the messages found in the special mailbox. If any emails are found, they will be fetched, parsed, and posted based on your configuration values. Only articles whose sender matches an active user in your Django site will be turned into articles. You can configure the command to mark such articles from email as "inactive" so they don't appear on the site without moderation. The default behavior, actually, is to mark the articles inactive--you must explicitly configure django-articles to automatically mark the articles as active if you want this behavior.

One of the biggest things that you should keep in mind with this new feature, though, is that it does not currently take your attachments into account. In time I plan on implementing this functionality. For now, only the plain text content of your email will be posted. Please see the project's README for more information about this new feature.

Please keep in mind that this is brand new functionality and it's not been very well tested in a wide variety of situations. Right now, it's in the "it works for me" stage. If you find problems with it, please create a ticket or update any similar existing tickets using the ticket tracker on bitbucket.org.

You can install or update django-articles using the following utilities:

  • pip install -U django-articles
  • easy_install -U django-articles
  • hg clone http://bitbucket.org/codekoala/django-articles/ or just hg pull -u if you have already cloned it
  • git clone git://github.com/codekoala/django-articles.git

Enjoy!

P.S. This article was posted via email

2Ze.us Updates

There has been quite a bit of recent activity in my 2ze.us project since I first released it nearly a year ago. My intent was not to become a competitor with bit.ly, is.gd, or anyone else in the URL-shortening arena. I created the site as a way for me to learn more about Google's AppEngine. It didn't take very long to get it up and running, and it seemed to work fairly well.

AppEngine and Extensions

I was able to basically leave the site alone on AppEngine for several months--through about September 2009. In that time, I came up with a Firefox extension to make its use more convenient.

The extension allows you to quickly get a shortened URL for the page you're currently looking at, and a couple of context menu items let you get a short URL for things like specific images on a page. Also included in the extension is a preview for 2ze.us links. The preview can tell you the title and domain of the link's target. It can tell you how much smaller the 2ze.us URL is compared to the full URL. Finally, it displays how many times that particular 2ze.us link has been clicked.

That as all fine and dandy. It was the second Firefox extension I had ever written, and it's still running strong. In June or July of 2009, I started working on a little program to make it easier for me to interact with Twitter the way I wanted to. This was a great opportunity for me to incorporate 2ze.us into the application so any URL I wanted to post to Twitter would automatically be shortened for me, using my own shortener.

Porting to WebFaction And PHP

Anyway, around the end of September 2009, I noticed that there were a lot of problems with 2ze.us. It was slow and sometimes completely unresponsive. Certain URLs would redirect to their full URLs, while others wouldn't. The Firefox extension stopped working nicely. Oh yeah, and AppEngine rolled back to a previous revision of the code without me telling it to. That's when everything just died. It didn't take long for me to decide to migrate my project from AppEngine onto my awesome WebFaction hosting.

At this point, I was faced with a small dilemma: keep the code in Python, or port it to PHP. I opted to port it over to PHP, because I didn't want all of the overhead of a full Django instance for a site that needed to be very zippy. And I was unacquainted with other Python options.

By early October 2009, I had managed to turn the project into a PHP beast, running on Apache. It was a lot more responsive than AppEngine ever let 2ze.us be. There were a few bumps along the road, what with the extension and Twitter client relying on various parts of the site. Eventually it got to a point where I could just let it sit and work.

Chromium Extension

Sometime around the end of December, I decided to write another extension for 2ze.us, only for Google Chrome and Chromium this time. This extension isn't quite as feature-packed as its Firefox brother, but it gets the job done.

Clip2Zeus

Shortly after "completing" the Chromium extension, I had what seemed like a pretty original idea. Who knows if it really is, but I still haven't seen another tool quite like the one that I made as a result of this idea. I thought, "Now, why should I need to install an extension in each Web browser I use on each computer I use? Is there a better way?"

The answer came quickly: a standalone, desktop application. Write one program that handles shortening URLs for you. My laziness told me to make a program that monitors your system clipboard for URLs. If a URL is detected, try to shorten it, and update the clipboard contents in place. Boom. Done. All extensions become useless beyond things like the URL preview (which is very useful, imo).

The next question I asked was, "Do I make it platform-dependent? Should I stick it to the majority of computer users and write my tool for Linux only? For OSX only? For, uh... Windows only?" Again, an easy question to answer. Support them all or don't even bother writing the application.

A week's worth of midnight hacking saw the birth of Clip2Zeus 1.0a. It's a cross-platform compatible desktop application that does exactly what I just mentioned. When it's running and detects a URL on your system clipboard, it will try to shorten it and update it in your clipboard. If you copy a block of text, the application will only modify the URLs in that block of text--meaning the block of text will still be in your clipboard, but it will have shorter URLs.

I use the program every day at work (on OSX). It's been very fun for me to see a short URL any time I copy a nasty URL to my clipboard. Imagine that; I'm a big fan of my own work...

Tornado

Lately, I've noticed that the site was getting kind of slow again. Sometimes it would take several seconds for Clip2Zeus to shorten URLs in my clipboard, when it was normally instantaneous. Every once in a while, Clip2Zeus would completely fail to connect to the website.

One of my friends has asked me a lot of questions about the Tornado framework in the past months. I had read a few things about Tornado when it was open-sourced last year, but I didn't really feel the need to dabble with it. These questions prompted me to tinker a little.

Last night I re-ported 2ze.us to Python, using the Tornado framework this time. So far I'm very impressed with its responsiveness. The framework offers a lot of neat little utilities, and it is very fast (as reported by dozens of other reputable sources).

On top of the speed increase that came with the transition to Tornado, my RAM usage on WebFaction has come down by nearly 100MB. Just by turning off the one Apache-backed website. Now I'm nowhere near my RAM cap! Wahoo!!

Enough rambling. Like I said at the beginning of this article, a lot has been happening with this project in the past year. I didn't even think about all of the time I put into projects related to my simple little side project. Looking back, I'm quite satisfied with how things have unfolded.

Statistics

Here are some simple statistics for 2ze.us. Since March 2009...

  • 5,252 URLs have been shortened using 2ze.us
  • 2ze.us links have been clicked 198,267 times
  • 315,951 URL characters have been turned into 11,532 characters

In April 2009...

  • 217 URLs were shortened
  • 2ze.us links were clicked 617 times

In February 2010...

  • 1,182 URLs were shortened
  • 2ze.us links were clicked 32,830 times

Not too shabby for a side project.

On Security and Python's Exec

A recent project at work has renewed my aversion to Python's exec statement--particularly when you want to use it with arbitrary, untrusted code. The project requirements necessitated the use of exec, so I got to do some interesting experiments with it. I've got a few friends who, until I slapped some sense into them, were seemingly big fans of exec (in Django projects, even...). This article is for them and others in the same boat.

Take this example:

#!/usr/bin/env python

import sys

dirname = '/usr/lib/python2.6/site-packages'

print dirname, 'in path?', (dirname in sys.path)

exec """import sys

dirname = '/usr/lib/python2.6/site-packages'
print 'In exec path?', (dirname in sys.path)

sys.path.remove(dirname)

print 'In exec path?', (dirname in sys.path)"""

print dirname, 'in path?', (dirname in sys.path)

Take a second and examine what the script is doing. Done? Great... So, the script first makes sure that a very critical directory is in my PYTHONPATH: /usr/lib/python2.6/site-packages. This is the directory where all of the awesome Python packages, like PIL, lxml, and dozens of others, reside. This is where Python will look for such packages when I try to import and use them in my programs.

Next, a little Python snippet is executed using exec. Let's say this snippet comes from an untrusted source (a visitor to your website, for example). The snippet removes that very important directory from my PYTHONPATH. It might seem like it's relatively safe to do within an exec--maybe it doesn't change the PYTHONPATH that I was using before the exec?

Wrong. The output of this script on my personal system says it all:

$ python bad.py
/usr/lib/python2.6/site-packages in path? True
In exec path? True
In exec path? False
/usr/lib/python2.6/site-packages in path? False

From this example, we learn that Python code that is executed using exec runs in the same context as the code that uses exec. This is a critical concept to learn.

Some people might say, "Oh, there's an easy way around that. Give exec its own globals dictionary to work with, and all will be well." Wrong again. Here's a modified version of the above script.

#!/usr/bin/env python

import sys

dirname = '/usr/lib/python2.6/site-packages'

print dirname, 'in path?', (dirname in sys.path)

context = {'something': 'This is a special context for the exec'}

exec """import sys

print something
dirname = '/usr/lib/python2.6/site-packages'
print 'In exec path?', (dirname in sys.path)

sys.path.remove(dirname)

print 'In exec path?', (dirname in sys.path)""" in context

print dirname, 'in path?', (dirname in sys.path)

And here's the output:

$ python also_bad.py
/usr/lib/python2.6/site-packages in path? True
This is a special context for the exec
In exec path? True
In exec path? False
/usr/lib/python2.6/site-packages in path? False

How can you get around this glaring risk in the exec statement? One possible solution is to execute the snippet in its own process. Might not be the best way to handle things. Could be the absolute worst solution. But it's a solution, and it works:

#!/usr/bin/env python

import multiprocessing
import sys

def execute_snippet(snippet):
    exec snippet

dirname = '/usr/lib/python2.6/site-packages'

print dirname, 'in path?', (dirname in sys.path)

snippet = """import sys

dirname = '/usr/lib/python2.6/site-packages'
print 'In exec path?', (dirname in sys.path)

sys.path.remove(dirname)

print 'In exec path?', (dirname in sys.path)"""

proc = multiprocessing.Process(target=execute_snippet, args=(snippet,))
proc.start()
proc.join()

print dirname, 'in path?', (dirname in sys.path)

And here comes the output:

$ python better.py
/usr/lib/python2.6/site-packages in path? True
In exec path? True
In exec path? False
/usr/lib/python2.6/site-packages in path? True

So the PYTHONPATH is only affected by the sys.path.remove within the process that executes the snippet using exec. The process that spawns the subprocess is unaffected, and can continue with life, happily importing all of those wonderful packages from the site-packages directory. Yay.

With that said, exec isn't always bad. But my personal point of view is basically, "There is probably a better way." Unfortunately for me, that does not hold up in my current situation, and it might not work for your circumstances too. If no one is forcing you to use exec, you might investigate alternatives in all of that free time you've been wondering what to do with.

Python And Execution Context

I recently found myself in a situation where knowing the execution context of a function became necessary. It took me several hours to learn about this functionality, despite many cleverly-crafted Google searches. So, being the generous person I am, I want to share my findings.

My particular use case required that a function behave differently depending on whether it was called in an exec call. Specifics beyond that are not important for this article. Here's an example of how I was able to get my desired behavior.

import inspect

def is_exec():
    caller = inspect.currentframe().f_back
    module = inspect.getmodule(caller)

    if module is None:
        print "I'm being run by exec!"
    else:
        print "I'm being run by %s" % module.__name__

def main():
    is_exec()

    exec "is_exec()"

if __name__ == '__main__':
    main()

The output of such a script would look like this:

$ python is_exec.py
I'm being run by __main__
I'm being run by exec!

It's also interesting to note that when you're using the Python interactive interpreter, calling the is_exec function from the code above will tell you that you are indeed using exec.

Some may argue that modifying behavior as I needed to is dirty, and that if your system requires such code, you're doing it wrong. Well, you could apply this sort of code to situations that have nothing to do with exec. Perhaps you want to determine which part of your product is using a specific function the most. Perhaps you want to get additional debugging information that isn't immediately obvious.

Just like always, I want to add the disclaimer that there may be other ways to do this and there probably are. However, this is the way that worked for me. I'd still be interested to here about other solutions you may have encountered for this problem.

On a side note, if you're up for some slightly advanced Python mumbo jumbo, I suggest diving into the inspect documentation.

Monitor Multiple Remote Files Using Multitail

There comes a time in each of our individual lives that we just learn to love log files. We learn to love utilities like tail and grep as we pore over countless lines of information, seeking out the stuff that really matters. We like to show off our debugging prowess as innocent bystanders look on in absolute wonderment.

While that's all fine and dandy, I'm always on the lookout for utilities to make my log monitoring less painful. A few weeks ago, my supervisor introduced me to a program that he's been using for quite some time: multitail. In essence, it's tail with some really neat features, such as the ability to:

  • "tail" multiple files (or commands, like netstat) independently in the same terminal
  • highlight text using regular expressions
  • search log messages and see only the matching lines
  • merge multiple files into one log window
  • scrolling back in the history of a log file
  • highlighting "themes"

I've been using multitail for a couple of weeks now (it took me a while to warm up to it after my supervisor introduce it), and I'm quite satisfied with it. One thing I really, really like about multitail is that I can kinda sorta almost monitor multiple remote files. What does that mean, you ask?

Well, my development environment includes at least 5 virtual machines, each of which will be logging different but equally important information. I want to be able to "tail" a specific log file on each of the virtual machines in one window. Now, it took me a while to learn how to do this, which is why I'm sharing the information with you.

And here comes my usual disclaimer: this may not be the most efficient way to do what I want to do, but it's currently working for me. I'm open to other solutions too!

Anyway, I can run a command like the following to monitor multiple remote log files:

multitail -l 'ssh user@host1 "tail -f /path/to/log/file"' -l 'ssh user@host2 "tail -f /path/to/log/file"'

Such a command would ssh into two computers, host1 and host2, and run tail -f /path/to/log/file on each. Multitail allows you to monitor the output of both tail commands in a single window, reducing clutter on your desktop. You can also arrange the files/commands you're "tailing" into various rows and columns. I tend to have a 2x2 grid of log files when I use multitail at work.

I've also started using multitail to monitor the access and error logs for my Django sites on WebFaction. I simply ssh into my account, run an alias for a ridiculous multitail command, and watch as both log files scroll on by.

Again, this is just another aspect of my work environment that is fun and useful to me, and I wanted to spread the joy. Multitail may or may not be a utility you like to use, but it suits my current needs and desires quite well. YMMV. And, once again, I'm always on the look-out for other tools to make my work life more interesting and productive!

Afloat: Window Management For OSX

Today I wanted to be able to watch the PyCon Live Stream while using OS X at work. A quick Google search returned an awesome program: Afloat. It lets me change the window settings for just about any OS X application. Right now I've got the live feed just lingering there in the background, pinned to my desktop. Loving it!

How I Have A Mobile & Desktop Site With Django

Part of the latest version of my site included the deployment of a mobile-friendly site. Up until recently, I hadn't even attempted to create a mobile site because I thought it would take more time than it was worth. I wanted something beyond just using CSS to hide certain elements on the page. I wanted to be able to break down the content of my site into its most basic pieces and only include what was necessary. Also, I wanted to figure it out on my own (instead of reusing wheels other people had invented before me--horrible, I know).

With these requirements, I was afraid it would require more resources than I could spare on my shared Web host. My initial impression was that I would have to leverage the django.contrib.sites framework in a fashion that would essentially require two distinct instances of my site running in RAM. Despite these feelings, I decided to embark on a mission to create a mobile-friendly site while still offering a full desktop-friendly site. It was surprisingly simple. This may not be the best way to do it, but it sure works for me, and I'm very satisfied. So satisfied, in fact, that I am going to share my solution with all of my Django-loving friends.

The first step is to add a couple of new settings to your settings.py file:

import os
DIRNAME = os.path.abspath(os.path.dirname(__file__))

TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates'),
)

MOBILE_TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates', 'mobile'),
)
DESKTOP_TEMPLATE_DIRS = (
    os.path.join(DIRNAME, 'templates', 'desktop'),
)

For those of you not used to seeing that os.path.join stuff, it's just a (very efficient) way to make your Django project more portable between different computers and even operating systems. The new variables are MOBILE_TEMPLATE_DIRS and DESKTOP_TEMPLATE_DIRS, and their respective meanings should be fairly obvious. Basically, this tells Django that it can look for templates in your_django_project/templates, your_django_project/templates/mobile, and your_django_project/templates/desktop.

Next, we need to install a middleware that takes care of determining which directory Django should pay attention to when rendering pages, between mobile and desktop. You can put this into your_django_project/middleware.py:

from django.conf import settings

class MobileTemplatesMiddleware(object):
    """Determines which set of templates to use for a mobile site"""

    ORIG_TEMPLATE_DIRS = settings.TEMPLATE_DIRS

    def process_request(self, request):
        # sets are used here, you can use other logic if you have an older version of Python
        MOBILE_SUBDOMAINS = set(['m', 'mobile'])
        domain = set(request.META.get('HTTP_HOST', '').split('.'))

        if len(MOBILE_SUBDOMAINS & domain):
            settings.TEMPLATE_DIRS = settings.MOBILE_TEMPLATE_DIRS + self.ORIG_TEMPLATE_DIRS
        else:
            settings.TEMPLATE_DIRS = settings.DESKTOP_TEMPLATE_DIRS + self.ORIG_TEMPLATE_DIRS

Now you need to install the new middleware. Back in your settings.py, find the MIDDLEWARE_CLASSES variable, and insert a line like the following:

'your_django_project.middleware.MobileTemplatesMiddleware',

Finally, if you already have a base.html template in your your_django_project/templates directory, rename it to something else, such as site_base.html. Now create two new directories: your_django_project/templates/mobile and your_django_project/templates/desktop. In both of those directories, create a new base.html template that extends site_base.html.

Example site_base.html

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>{% block base_title %}Code Koala{% endblock %} - {% block title %}Welcome{% endblock %}</title>
<link href="{{ MEDIA_URL }}css/common.css" rel="stylesheet" type="text/css" media="screen" />
{% block extra-head %}{% endblock %}

</head>
<body>
<div id="page-wrapper">
    {% block header %}
    <div id="logo">
        <h1><a href="/">Code Koala</a></h1>
    </div>
    <div id="header">
        <div id="menu">
            <ul>
                <li><a href="/" class="first">Home</a></li>
                <li><a href="/blog/">Blog</a></li>
                <li><a href="/about/">About</a></li>
                <li><a href="/contact/">Contact</a></li>
            </ul>
        </div>
    </div>
    {% endblock %}
    <div id="page">
        <div id="content">
            {% block content %}{% endblock %}
        </div>
        <div id="sidebar">
            {% block sidebar %}
            Stuff
            {% endblock %}
        </div>
    </div>
    <div id="footer">
        {% block footer %}
        Footer stuff
        {% endblock %}
    </div>
</div>
</body>
</html>

Example desktop/base.html

{% extends 'site_base.html' %}

{% block extra-head %}
<!-- stylesheets -->
<link href="{{ MEDIA_URL }}css/desktop.css" rel="stylesheet" type="text/css" media="screen" />

<!-- JavaScripts -->
<script type="text/javascript" src="{{ MEDIA_URL }}js/jquery.js"></script>
{% endblock %}

Example mobile/base.html

{% extends 'site_base.html' %}

{% block extra-head %}
<!-- stylesheets -->
<link href="{{ MEDIA_URL }}css/mobile.css" rel="stylesheet" type="text/css" media="screen" />
{% endblock %}

{% block sidebar %}{% endblock %}

Please forgive me if the HTML or whatever is incorrect--I butchered the actual templates I use on Code Koala for the examples. There are some neat things you can do in your pages to make them more mobile friendly, such as including something like <meta name="viewport" content="width=device-width; initial-scale=1.0; maximum-scale=1.0; user-scalable=0;" /> in your <head> tag. This is supposed to tell your visitor's browser to not scale your pages to make it all fit on the screen. You can find a lot of other such tips elsewhere on the interwebs, and I'm sure they'll be better explained elsewhere too. You can also find scripts to handle redirecting your visitors to a mobile site and whatnot. Google is your friend.

As for the Django side of things, that should be just about it. If you have other templates you want to customize based on the version of your site that visitors are viewing, simply add those templates to the your_django_project/templates/mobile or your_django_project/templates/desktop directories as necessary. For example, if you have an application called blog, and you want to override the entry_detail.html template for the mobile site, so it doesn't pull in a bunch of unnecessary information to save bandwidth, you could save your modified copy in your_django_project/templates/mobile/blog/entry_detail.html.

With this setup, all you have to do is point your main domain and a subdomain such as m.yourdomain.com to the same Django application, and the middleware will take care of the "heavy lifting". No need for an additional instance of your Django project just for the mobile site. No hackish hiding of elements using CSS. If you find this article useful and decide to use these techniques on your site, please let me know how it works in your environment and if you ran into any snags so I can update the information!

Site-Wide Caching in Django

My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

  • django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
  • django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

  • CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
  • CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.