Another I-Heart-Linux Story    Posted:


Just had another experience that represents much of what I love about Linux and free software, and I thought I would share.

I just got my first bluray burner. I have it in a USB3-capable enclosure to use with my USB2-capable laptop. I knew burn speeds would be much lower than the drive is rated to handle primarily because of this arrangement.

A good portion of yesterday included me trying to burn my first bluray in Linux. Unfortunately, the nature of bluray standards and Linux still don't jive very well. My usual burning applications didn't have any trouble at all detecting the burner or the media, but none of them seemed to want to actually burn anything. K3b made a coaster out of one disc; Brasero did nothing. I didn't try using the command line burning tricks that I found on the interwebs.

I decided to give ImgBurn a shot. I installed it with WINE, and it ran beautifully. However, I failed to realize that I left K3b's coaster in the drive and couldn't get a disc to burn. In my own defense, the coaster was still recognized by the computer as a blank BD-R. It was at this point that I just rebooted into Windows just to verify that the drive itself worked for burning.

Windows obviously burned the first disc fine (after I realized the coaster was still in the drive and replaced it with a fresh disc). The media I have are only rated for 4x burn speeds, while the drive itself is rated for 14x burning. Windows only managed to achieve ~2x for the duration of the burn (using ImgBurn again). Also, Windows seemed to struggle trying to keep the buffers full. Every 70 seconds or so, ImgBurn would stop burning to allow the buffers to fill up again and for the hard drive to settle down a bit. But it did end up working--the disc worked in my bluray player that's hooked up to the TV!!! Success!

I decided to give ImgBurn in WINE another shot. I popped a fresh BD-R into the drive, specified the files I wanted to burn, and initiated the burn process. Right off the bat, ImgBurn under WINE was burning at 2.4x. I watched the process for 10 minutes or so before heading upstairs to let it finish. Not once was the main buffer depleted. The drive buffer (4MB) was up and down every so often, but it was never empty from the time the burn began until the time when I left my computer.

I realize that some of you out there will just see the moral of this story being something like, "Linux doesn't support burning bluray discs very well, and Windows worked on the first try." If you find yourself in this boat, you're probably not part of the intended audience of this tale. This is probably because you have some bias against any operating system that isn't the one you're currently using. No, this story is intended for people who like to tinker and don't think it's a waste of time to go through exercises like this.

Anyway, the real reason I wrote this story was to illustrate one of my favorite things about Linux: it's fast! Burning a bluray disc using an application that was written for Windows, running with a (free, open source, community-driven) Windows compatibility layer in Linux is still faster than the same program running natively on Windows. Sure, it might not always be the case, but in my ~12 years of hands-on experience with Linux, I find that it usually is the case.

UPDATE

So I've burned another 5 blurays under Linux with ImgBurn. Here's the interesting part of my most recent burn log:

I 00:51:40 Operation Started!
I 00:51:40 Source File: -==/\/[BUILD IMAGE]\/\==-
I 00:51:40 Source File Sectors: 11,978,272 (MODE1/2048)
I 00:51:40 Source File Size: 24,531,501,056 bytes
I 00:51:40 Source File Volume Identifier: Stargate - Disc 6
I 00:51:40 Source File Volume Set Identifier: 4222067200B6C611
I 00:51:40 Source File Application Identifier: IMGBURN V2.5.7.0 - THE ULTIMATE IMAGE BURNER!
I 00:51:40 Source File Implementation Identifier: ImgBurn
I 00:51:40 Source File File System(s): ISO9660, UDF (1.02)
I 00:51:40 Destination Device: [3:0:0] HL-DT-ST BD-RE  WH14NS40 1.00
I 00:51:40 Destination Media Type: BD-R (Disc ID: PHILIP-R04-000)
I 00:51:40 Destination Media Supported Write Speeds: 4x, 6x, 8x
I 00:51:40 Destination Media Sectors: 12,219,392
I 00:51:40 Write Mode: BD
I 00:51:40 Write Type: DAO
I 00:51:40 Write Speed: MAX
I 00:51:41 Hardware Defect Management Active: No
I 00:51:41 BD-R Verify Not Required: Yes
I 00:51:41 Link Size: Auto
I 00:51:41 Lock Volume: Yes
I 00:51:41 Test Mode: No
I 00:51:41 OPC: No
I 00:51:41 BURN-Proof: Enabled
I 00:51:41 Write Speed Successfully Set! - Effective: 35,968 KB/s (8x)
I 00:52:00 Filling Buffer... (80 MB)
I 00:52:02 Writing LeadIn...
I 00:52:03 Writing Session 1 of 1... (1 Track, LBA: 0 - 11978271)
I 00:52:03 Writing Track 1 of 1... (MODE1/2048, LBA: 0 - 11978271)
I 01:18:33 Synchronising Cache...
I 01:18:34 Closing Track...
I 01:18:35 Finalising Disc...
I 01:18:50 Exporting Graph Data...
I 01:18:50 Graph Data File: C:\users\wheaties\Application Data\ImgBurn\Graph Data Files\HL-DT-ST_BD-RE_WH14NS40_1.00_WEDNESDAY-JANUARY-02-2013_12-51_AM_PHILIP-R04-000_MAX.ibg
I 01:18:50 Export Successfully Completed!
I 01:18:50 Operation Successfully Completed! - Duration: 00:27:09
I 01:18:50 Average Write Rate: 15,067 KB/s (3.4x) - Maximum Write Rate: 18,615 KB/s (4.1x)

As you can see, the last line suggests things are working quite nicely in Linux:

I 01:18:50 Average Write Rate: 15,067 KB/s (3.4x) - Maximum Write Rate: 18,615 KB/s (4.1x)

I would like to note also that two of the discs I've burned so far have had some problems being read on the computer afterwards. These discs do, however, work quite nicely in the bluray player for my TV. Might just be circumstance.

Comments

Site-Wide Caching in Django    Posted:


My last article about caching RSS feeds in a Django project generated a lot of interest. My original goal was to help other people who have tried to cache QuerySet objects and received a funky error message. Many of my visitors offered helpful advice in the comments, making it clear that I was going about caching my feeds the wrong way.

I knew my solution was wrong before I even produced it, but I couldn't get Django's site-wide caching middleware to work in my production environment. Site-wide caching worked wonderfully in my development environment, and I tried all sorts of things to make it work in my production setup. It wasn't until one "Jacob" offered a beautiful pearl of wisdom that things started to make more sense:

This doesn't pertain to feeds, but one rather large gotcha with the cache middleware is that any javascript you are running that plants a cookie will affect the cache key. Google analytics, for instance, has that effect. A workaround is to use a middleware to strip out the offending cookies from the request object before the cache middleware looks at it.

The minute I read that comment, I realized just how logical it was! If Google Analytics, or any other JavaScript used on my site, was setting a cookie, and it changed that cookie on each request, then the caching engine would effectively have a different page to cache for each request! Thank you so much, Jacob, for helping me get past the frustration of not having site-wide caching in my production environment.

How To Setup Site-Wide Caching

While most of this can be gleaned from the official documentation, I will repeat it here in an effort to provide a complete "HOWTO". For further information, hit up the official caching documentation.

The first step is to choose a caching backend for your project. Built-in options include:

To specify which backend you want to use, define the CACHE_BACKEND variable in your settings.py. The definition for each backend is different, so check out the official documentation for details.

Next, install a couple of middleware classes, and pay attention to where the classes are supposed to appear in the list:

  • django.middleware.cache.UpdateCacheMiddleware - This should be the first middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.
  • django.middleware.cache.FetchFromCacheMiddleware - This should be the last middleware class in your MIDDLEWARE_CLASSES tuple in your settings.py.

Finally, you must define the following variables in your settings.py file:

  • CACHE_MIDDLEWARE_SECONDS - The number of seconds each page should be cached
  • CACHE_MIDDLEWARE_KEY_PREFIX - If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don't care

If you don't use anything like Google Analytics that sets/changes cookies on each request to your site, you should have site-wide caching enabled now. If you only want pages to be cached for users who are not logged in, you may add CACHE_MIDDLEWARE_ANONYMOUS_ONLY = True to your settings.py file--its meaning should be fairly obvious.

If, however, your site-wide caching doesn't appear to work (as it didn't for me for a long time), you can create a special middleware class to strip those dirty cookies from the request, so the caching middleware can do its work.

import re

class StripCookieMiddleware(object):
    """Ganked from http://2ze.us/Io"""

    STRIP_RE = re.compile(r'\b(_[^=]+=.+?(?:; |$))')

    def process_request(self, request):
        cookie = self.STRIP_RE.sub('', request.META.get('HTTP_COOKIE', ''))
        request.META['HTTP_COOKIE'] = cookie

Edit: Thanks to Tal for regex the suggestion!

Once you do that, you need only install the new middleware class. Be sure to install it somewhere between the UpdateCacheMiddleware and FetchFromCacheMiddleware classes, not first or last in the tuple. When all of that is done, your site-wide caching should really work! That is, of course, unless your offending cookies are not found by that STRIP_RE regular expression.

Thanks again to Jacob and "nf", the original author of the middleware class I used to solve all of my problems! Also, I'd like to thank "JaredKuolt" for the django-staticgenerator on his github account. It made me happy for a while as I was working toward real site-wide caching.

Comments

Syndication Caching in Django    Posted:


I've recently been working on some performance enhancements on my site. Apparently some of my latest articles are a little too popular for my shared hosting plan. The surge of traffic to my site took down several sites on the same server as my own.

My response to the fiasco was to, among other things, implement caching on my site. It seems like the caching has helped a lot. I've noticed that my RSS feeds are hit almost as hard as real articles on my site, and I noticed that they weren't being cached the way I had expected. I tried a couple of things that I thought would work, but nothing seemed to do the trick.

After doing some brief research into the idea of caching my RSS feeds using Django's built-in caching mechanisms, I came up empty. It occurred to me to implement caching in the feed classes themselves. I tried something like this:

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = Article.objects.active().order_by('-publish_date')[:10]
            cache.set('latest_articles', articles)

        return articles

    ...

This code doesn't work! When I would try to retrieve one of my RSS feeds with such "caching" in place, I got the following traceback:

Traceback (most recent call last):

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 280, in run
    self.result = application(self.environ, self.start_response)

  File "/home/wheaties/dev/django/core/servers/basehttp.py", line 674, in __call__
    return self.application(environ, start_response)

  File "/home/wheaties/dev/django/core/handlers/wsgi.py", line 241, in __call__
    response = self.get_response(request)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 143, in get_response
    return self.handle_uncaught_exception(request, resolver, exc_info)

  File "/home/wheaties/dev/django/core/handlers/base.py", line 101, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 36, in __call__
    return self.decorator(self.func)(*args, **kwargs)

  File "/home/wheaties/dev/django/utils/decorators.py", line 86, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 215, in feed
    feedgen = f(slug, request).get_feed(param)

  File "/home/wheaties/dev/django/contrib/syndication/feeds.py", line 37, in get_feed
    return super(Feed, self).get_feed(obj, self.request)

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 134, in get_feed
    for item in self.__get_dynamic_attr('items', obj):

  File "/home/wheaties/dev/django/contrib/syndication/views.py", line 69, in __get_dynamic_attr
    return attr()

  File "/home/wheaties/dev/articles/feeds.py", line 22, in items
    cache.set(key, articles)

  File "/home/wheaties/dev/django/core/cache/backends/filebased.py", line 72, in set
    pickle.dump(value, f, pickle.HIGHEST_PROTOCOL)

PicklingError: Can't pickle <class 'django.utils.functional.__proxy__'>: attribute lookup django.utils.functional.__proxy__ failed

This error took me by surprise. I didn't expect anything like this. I tried a few things to get around it, but then I actually stopped to consider what was happening to cause such an error. My Article objects are definitely serializable, which is why the error didn't make sense.

Then it hit me: the object I was actually attempting to cache was a QuerySet, not a list or tuple of Article objects. Changing the code to wrap the Article.objects.active() call with list().

from django.contrib.syndication.feeds import Feed
from django.core.cache import cache
from articles.models import Article

class LatestEntries(Feed):
    ...

    def items(self):
        articles = cache.get('latest_articles')

        if articles is None:
            articles = list(Article.objects.active().order_by('-publish_date')[:10])
            cache.set('latest_articles', articles)

        return articles

    ...

And that one worked. I would prefer to cache the actual XML version of the RSS feed, but I will settle with a few hundred fewer hits to my database each day by caching the list of articles. If anyone has better suggestions, I'd love to hear about them. Until then, I hope my experience will help others out there who are in danger of taking down other sites on their shared hosting service!

Comments