VirtualBox Slowing Down Mah Linux (gasp!)

As I booted up my laptop tonight, I noticed that things were extremely sluggish. It didn't take long at all to realize just how painfully slow things were. The problem didn't appear to be during boot, as the boot sequence was about as fast as it usually is. No, the speed was attacked as X was being loaded.

It has been a few days since I had time to use my laptop, and I thought I might have had to do a hard shutdown the last time I used it (for whatever reason). This lead me to wonder if it could have been something with the filesystem being corrupted. Usually a reboot solves these sort of problems for me, so that's what I chose to do. Unfortunately, it didn't do the trick. Upon rebooting, the laptop started up fine during boot, but as soon as X started up everything was slow again.

Some time when I was trying to see why my computer was running so slow, I pulled up the system monitor to see if any processes were obviously hoarding the CPU power. There were a few processes that stood out, and they all started with VBox. The next thing I thought of was the last system update I did. It involved a new kernel, and that last system update was done the last time I used my laptop.

It dawned on me that I hadn't recompiled my VirtualBox drivers since I did my system update. I kicked off the usual /etc/init.d/vboxdrv setup command. As soon as it was done, my computer was all the sudden very responsive--the way it usually is.

Django-Articles 2.1.1 Released

I've been working on some neat changes to django-articles recently, and I've just released version 2.1.1. The most noticeable feature in this release is Auto-Tagging. Since I feel like I've described the feature fairly well in the README, I'll just copy/paste that section here.

The auto-tagging feature allows you to easily apply any of your current tags to your articles. When you save an Article object with auto-tagging enabled for that article, django-articles will go through each of your existing tags to see if the entire word appears anywhere in your article's content. If a match is found, that tag will be added to the article.

For example, if you have tags "test" and "art", and you wrote a new auto-tagged Article with the text:

This is a test article.

django-articles would automatically apply the "test" tag to this article, but not the "art" tag. It will only apply the "art" tag automatically when the actual word "art" appears in the content.

Auto-tagging does not remove any tags that are already assigned to an article. This means that you can still add tags the good, old-fashioned way in the Django Admin without losing them. Auto-tagging will only add to an article's existing tags (if needed).

Auto-tagging is enabled for all articles by default. If you want to disable it by default (and enable it on a per-article basis), set ARTICLES_AUTO_TAG to False in your settings.py file.

Auto-Tagging does not attempt to produce any keywords that magically represent the content of your articles. Only existing tags are used!!

I sure had fun programming this little feature. I know it will be particularly useful for my own site.

Another item I'd like to mention about this release: I've finally started using South migrations in this app. This is a move I've been planning to make for quite some time now.

Head on over to http://bitbucket.org/codekoala/django-articles or use pip install -U django-articles (or easy_install django-articles if you must)! Enjoy!

Vim Tip: Global Delete

Today I was asked to help debug a problem with our product's patcher. All of the debug information for the entire product goes into a single log file, and some processes are quite chatty. The log file that contained the information I was interested in for the patcher problems was some 26.5MB by the time I got it.

All of the lines I was interested in were very easy to find, because they contained specific strings (yay). The problem was that they were scattered throughout the log, in between debug output for other processes. At first, I tried to just delete lines that were meaningless for me, but that got old very quickly. This is how I made my life easier using Vim.

It's possible to do a "global delete" on lines that don't contain the stuff you are interested in. The lines I wanted to see contained one of two words, but I'll just use foo and bar for this example:

:g!/\v(foo|bar)/d

This command will look for any line that does not contain foo or bar and delete it. Here's the breakdown:

  • :g - This is the command for doing some other command on any line that matches a pattern
  • ! - Negate the match (perform the pending command on any line that does not contain the pattern)
  • /\v(foo|bar)/ - The regular expression pattern
    • \v - Use of \v means that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning (very magic). Basically, it removes the need to escape almost everything in your regex.
    • (foo|bar) - Find either foo or (|) bar
  • d - The command to perform on matching lines, which is delete in this case

So, executing that command in the Vim window with the log file wiped out all of the lines that didn't have my magical keywords in them.

When I showed my co-worker how awesome Vim was, he was mildly impressed, and then he asked, "What about multiline log messages?" My particular case didn't have any multiline messages, but I wanted to figure it out anyway. I haven't been able to figure out an exact method for deleting the lines that don't match, but I have found a way to show only the lines that match:

:g!/\v^".+(foo|bar)\_.{-}^"/p

This command is pretty close to the previous one.

  • :g - Global command on lines that match a pattern
  • ! - Negate the match (seems a little backward this time)
  • /\v^".+(foo|bar)\_.{-}^"/ - The regular expression pattern
    • \v - Very magic
    • ^" - Find a line that starts with a double quote ("). Each of our individual log messages starts with a double quote that is guaranteed to be at the beginning of the line, so this is specific to our environment.
    • .+ - One or more characters between the " and foo or bar
    • (foo|bar) - Find either foo or (|) bar
    • \_.{-}^" - Non-greedy multiline match. Matches any character, including newlines (because of the \_), and continues matching until it reaches the next line that begins with ^". Again, that double quote is specific to our environment. The {-} is what makes this a "non-greedy" match--it's like using *, but it matches matches as few as possible of the preceding atom.
  • p - The command to perform on matching lines, which is print in this case. This brings up a separate little window that displays each match (which is why I mentioned the negation seemed a bit backward to me). Navigation and whatnot in this window appears to be similar to less on the command line.

And there you have it! I hope you find this information as useful as it has been for me!

Quick And Easy Execution Speed Testing

There have been many times when I've been programming, encounter a problem that probably involves a loop of some sort, and I think of two or more possible ways to achieve the same end result. At this point, I usually think about which one will probably be the fastest solution (execution-wise) while still being readable/maintainable. A lot of the time, the essentials of the problem can be tested in a few short lines of code.

A while back, I was perusing some Stack Overflow questions for work, and I stumbled upon what I consider one of the many hidden jewels in Python: the timeit module. Given a bit of code, this little guy will handle executing it in several loops and giving you the best time out of three trials (you can ask it to do more than 3 runs if you want). Once it completes its test, it will offer some very clean and useful output.

For example, today I encountered a piece of code that was making a comma-separated list of an arbitrary number of "%s". The code I saw essentially looked like this:

",".join(["%s"] * 50000)

Even though this code required no optimization, I thought, "Hey, that's neat... I wonder if a list comprehension could possibly be any faster." Here's an example of the contender:

",".join(["%s" for i in xrange(50000)])

I had no idea which would be faster, so timeit to the rescue!! Open up a terminal, type a couple one-line Python commands, and enjoy the results!

$ python -mtimeit 'l = ",".join(["%s"] * 50000)'
1000 loops, best of 3: 1.15 msec per loop
$ python -mtimeit 'l = ",".join(["%s" for i in xrange(50000)])'
100 loops, best of 3: 3.23 msec per loop

Hah, the list comprehension is certainly slower.

Now, for other more in-depth tests of performance, you might consider using the cProfile module. As far as I can tell, simple one-liners can't be tested directly from the command line using cProfile--they apparently need to be in a script. You can use something like:

python -mcProfile script.py

...in such situations. Or you can wrap function calls using cProfile.run():

import cProfile

def function_a():
    # something you want to profile

def function_b():
    # an alternative version of function_a to profile

if __name__ == '__main__':
    cProfile.run('function_a()')
    cProfile.run('function_b()')

I've used this technique for tests that I'd like to have "hard evidence" for in the future. The output of such a cProfile test looks something like this:

3 function calls in 6.860 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    6.860    6.860 <string>:1(<module>)
     1    6.860    6.860    6.860    6.860 test_enumerate.py:5(test_enumerate)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This is useful when your code is calling other functions or methods and you want to find where your bottlenecks are. Hooray for Python!

What profiling techniques do you use?

Whoa! Another Reason To Love Vim

I've been struggling with some misconfigured appliances at work for the past couple of days, and I was getting tired of manually diff-ing things. On a whim, I decided to ask Google if there is a better way. Turns out there is, and it uses what I already know and love: VIM. Here's a command that lets you diff two remote file using vimdiff:

vimdiff scp://user@host//path/to/file scp://user@otherhost//path/to/file

This is going to save me so much time! I hope it is as useful to you all as it is to me.

SVN Commits By User

The other day at work, I found myself needing to see a list of Subversion commits by a specific user. I spent a few minutes looking at the svn log help, but nothing seemed to be designed to show commits by user. It took me a while to find something to do the trick, but this is it:

svn log | sed -n '/username/,/-----$/ p'

Gotta love sed!

Selenium Unit Test Reuse

Yesterday, one of the QA guys at work approached me with a question that turned out to be much more interesting to me than I think he had planned. He's been doing some unit testing using Selenium, exporting his test cases to Python. His question was this: how can I run the same unit tests using multiple browsers and multiple target servers?

I'm pretty sure he expected a simple 3-step answer or something like that. Instead, he got my crazy wide-eyed "ohhh... that's something I want to experiment with!" look. I started rambling on about inheritance, dynamic class creation, and nested for loops. His eyes started to look a little worried. He didn't really appreciate the nerdy lingo that much. I told him to pull up a chair and get comfortable.

Since I already had some other work I needed to pay attention to, I didn't want to spend too much time trying to figure out a good way to solve his problem. After about 20 minutes of devilish chuckles and frantic rustling through Python documentation, I came up with the following code:

from types import ClassType
from selenium import selenium
import unittest

IPS = ['192.168.0.1', '192.168.0.2']
BROWSERS = ['safari', 'chrome']

class SomeUnitTest(object):

    def test_something(self):
        sel = self.selenium
        # test code

def main(base):
    suites = []
    results = unittest.TestResult()

    for iidx, ip in enumerate(IPS):
        for bidx, browser in enumerate(BROWSERS):
            def setUp(self):
                self.verificationErrors = []
                self.selenium = selenium("localhost", 4444, "*%s" % self.browser, "http://%s/" % self.ip)
                self.selenium.start()

            def tearDown(self):
                self.selenium.stop()
                self.assertEqual([], self.verificationErrors)

            ut = ClassType('UT_%i_%i' % (iidx, bidx), (unittest.TestCase, base), {'ip': ip, 'browser': browser})
            ut.setUp = setUp
            ut.tearDown = tearDown

            suites.append(unittest.TestLoader().loadTestsFromTestCase(ut))

    unittest.TestSuite(suites)(results)
    for obj, error in results.errors:
        print 'In: ', obj
        print error

if __name__ == "__main__":
    main(SomeUnitTest)

I know, I know... it's got some dirty rotten tricks in it, and there are probably more efficient ways of doing what I've done. If the code offends you, look up at my previous disclaimer: I had other things I needed to be working on, so I didn't spend much time refining this. One thing I'm almost certain could be done better is not monkey patching the dynamic classes with the setUp and tearDown methods. Also, the output at the end of the test execution could definitely use some love. Oh well. Perhaps another day I'll get around to that.

Basically, you just set the servers you need to test and the browsers you want Selenium to run the tests in. Those are at the top of the script: IPS and BROWSERS. Then a new unittest.TestCase class is created for each combination of IP/server+browser. Finally, each of the test cases is thrown into a TestSuite, and the suite is processed. If there were any errors during the tests, they'll be printed out. We weren't really concerned with printing out other information, but you can certainly make other meaningful feedback appear.

Anyway, I thought that someone out there might very well benefit from my little experiment on my co-worker's question. Feel free to comment on your personal adventures with some variation of the code if you find it useful!

More django-articles Updates

I've spent a little more time lately adding new features to django-articles. There are two major additions in the latest release (2.0.0-pre2).

  • Article attachments
  • Article statuses

That's right folks! You can finally attach files to your articles. This includes attachments to emails that you send, if you have the articles from email feature properly configured. To prove it, I'm going to attach a file to this article (which I'm posting via email).

Next, I've decided that it's worth allowing the user to specify different statuses for their articles. One of the neat things about this feature is that if you are a super user, you're logged in, and you save an article with a status that is designated as "non-live", you will still be able to see it on the site. This is a way for users to preview their work before making it live. Out of the box, there are only two statuses: draft and finished. You're free to add more statuses if you feel so inclined (they're in the database, not hardcoded).

The article status is still separate from the "is_active" flag when saving an article. Any article that is marked as inactive will not appear on the site regardless of the article's "status".

On a slightly less impressive note (although still important), this release includes some basic unit tests. Most of the tests currently revolve around article statuses and making sure that the appropriate articles appear on the site.