Simple Weighted Sort in Python

Last night I found myself in need of a simple weighted sort function in Python. I had a list of integers which represented object IDs in my project. Some of the objects needed to be processed before the others while iterating over the list of integers, and I already knew which object IDs those were. The order the rest of the object IDs were processed didn't matter at all. I just wanted the special object IDs to arrive at the beginning of the list, and the remaining object IDs could be in any order.

I was surprised at how simple it was to produce such a weighted sort. Here's an example of what I did:

import random
object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]
print 'Object IDs:', object_ids
print 'Special IDs:', special_ids

object_ids.sort(key=special_ids.__contains__, reverse=True)
print 'Object IDs:', object_ids

And some sample output:

Object IDs: [13, 97, 67, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 27, 11, 6, 1]
Special IDs: [13, 1, 27, 6, 67]
Object IDs: [13, 67, 27, 6, 1, 97, 5, 77, 58, 24, 99, 29, 20, 29, 75, 100, 31, 79, 5, 11]

Notice that each of the "special" IDs have shifted from their original position in the object_ids list to be at the beginning of the list after the sort.

The Python documentation for sort says that the key argument "specifies a function of one argument that is used to extract a comparison key from each list element." I'm using it to check to see if a given element in the list is in my special_ids list. If the element is present in the special_ids list, it will be shifted to the left because of the way the special_ids.__contains__ works.

In sorting, a value of 1 (or other positive integer) out of a comparison function generally means "this belongs to the right of the other element." A value of -1 (or other negative integer) means "this belongs to the left of the other element." A value of 0 means "these two elements are equal" (for the purposes of sorting). I'm assuming it works similarly with the key argument. Please correct me if I'm wrong!

As lqc states in the comments below, the key argument works differently. It creates a new sequence of values which is then sorted. Before lqc jumped in, I was using key=int(i in special_ids) * -2 + 1 to do the sorting, which is pretty dumb. Using key=special_ids.__contains__ is much more appropriate. Thanks lqc!!

This sort of weighted sort might not be just right for your needs, but hopefully it will give you a place to start to build your customized weighted sort!

PIR Motion Sensor + LCD Screen + Arduino Uno

For one reason or another, I've recently had the urge to follow in my father's footsteps and start tinkering with electronics. He's basically a wizard. He has fixed a lot of appliances that others tossed out the door without the slightest bit of investigation. I think he did this with the monitor I had hooked up to my very first computer. He just popped open the case, found the problem, soldered some solution in there, and that monitor worked for probably a decade afterwards. Amazing stuff.

Anyway, I have an itch to create a laser trip wire (don't ask). I took a basic electronics class my freshman year in college, so I am already familiar with resistors, capacitors, ICs, breadboards, soldering, etc. It doesn't seem like a very daunting task to create that trip wire, but I'm starting fresh with electronics (it's been a good 8 or 9 years since I last soldered or anything like that).

One of my co-workers mentioned the Arduino as something to get me started on the trip wire, so I started doing some research here and there. The more I read, the more excited I got. I wanted soo badly to buy all of the junk that I'd need to start tinkering with an Arduino, but I didn't think my wife would appreciate that--especially around Christmas time.

Several of my awesome relatives sent me very generous gift cards for Christmas this year, which finally gave me the opportunity to buy a load of stuff for the Arduino without feeling bad. So I ordered loads of stuff. Most of it is here, some of it is still on its way. My Arduino Uno arrived around noon today, and I haven't been able to stop tinkering! It's really a lot of fun, and stupid easy even if you're not a programmer!

I started with the basic "oooh, blinking LEDs!" sort of projects (or "sketches" in Arduino parlance). Then I moved on to tweak those to work with multiple LEDs. There were a few different scenarios I ran though because two of my LEDs weren't lighting up very well. I guess I either hooked them up wrong or I didn't seat them properly... whatever.

The next project was when I hooked up a PIR (passive infrared) motion sensor ($9.99 at RadioShack) and installed a demo sketch that I found on the Arduino wiki. I wasn't quite sure how to wire everything for the PIR sensor, so I took a look at this Make Magazine video to see one way of setting up the circuit. That one simply lights up an LED and makes some noise. I just made mine light up and LED since I don't have a buzzer (yet).

Next I moved on to the LCD. The package came with a 20x4 green-on-black backlit LCD display, a 2.2k Ohm resistor, and a set of pins to connect the LCD to my breadboard or whatever. I actually soldered the pins onto the LCD display board (wow, was that even more difficult than I remember!) so I would have less of a hassle getting all of the contacts working. Getting the LCD to work was pretty easy after that.

Then I took those two projects a bit further. I'm certain others have done this years ago, but I didn't use a sketch that was completely written for me this time so I felt special enough to share :) I added the PIR circuit from before less the LED, merged pieces from both demo programs, and came up with a motion sensor that would flash "INTRUDER ALERT!!" on the LCD screen a few times when triggered. Based on my testing, the sensor works extremely well (once calibrated!!), and it will detect motion well across the open area of my apartment (maybe a bit further than 20 feet)!

I'm quite happy with my work, especially for not having "dealt with" electronics for such a long time. When I tried to share my success with some of my friends, they wanted a video to prove I'm awesome (I guess?). So here it is. Trust me, I know the video is horrible--I speak too quietly (baby sleeping in next room), I stumble over my words (that's just me), and I neglected to even offer you some music to rock out to as I blabber about this stuff.

Here's the program I wrote/modified:

 /*
  * //////////////////////////////////////////////////
  * //making sense of the Parallax PIR sensor's output
  * //////////////////////////////////////////////////
  *
  * Switches a LED according to the state of the sensors output pin. Determines
  * the beginning and end of continuous motion sequences.
  *
  * @author: Kristian Gohlke / krigoo (_) gmail (_) com / http://krx.at
  * @author: Josh VanderLinden / codekoala (.) gmail (@) com / http://www.codekoala.com
  * @date:   3 Jan 2011
  *
  * kr1 (cleft) 2006
  * Released under a creative commons "Attribution-NonCommercial-ShareAlike 2.0" license
  * http://creativecommons.org/licenses/by-nc-sa/2.0/de/
  *
  *
  * The Parallax PIR Sensor is an easy to use digital infrared motion sensor module.
  * (http://www.parallax.com/detail.asp?product_id=555-28027)
  *
  * The sensor's output pin goes to HIGH if motion is present. However, even if
  * motion is present it goes to LOW from time to time, which might give the
  * impression no motion is present. This program deals with this issue by
  * ignoring LOW-phases shorter than a given time, assuming continuous motion is
  * present during these phases.
  *
  */

 #include <LiquidCrystal.h>

 int calibrationTime = 10;   // seconds to calibrate PIR
 long unsigned int pause = 5000; // timeout before we "all" motion has ceased
 long unsigned int lowIn; // the time when the sensor outputs a low impulse

 boolean lockLow = true;
 boolean takeLowTime;

 int flashCnt = 4;  // number of times the LCD will flash when there's motion
 int flashDelay = 500; // number of ms to wait while flashing LCD
 int pirPin = 7;    // the digital pin connected to the PIR sensor's output
 int lcdPin = 13;   // pin connected to LCD
 LiquidCrystal lcd(12, 11, 10, 5, 4, 3, 2);

 void setup() {
     // Calibrates the PIR

     Serial.begin(9600);
     pinMode(pirPin, INPUT);
     pinMode(lcdPin, OUTPUT);

     digitalWrite(pirPin, LOW);

     clearLcd();

     // give the sensor some time to calibrate
     lcd.setCursor(0,0);
     lcd.print("Calibrating...");

     for(int i = 0; i < calibrationTime; i++){
         lcd.print(".");
         delay(1000);
     }

     lcd.print("done");
     delay(50);
 }

 void clearLcd() {
     // Clears the LCD, turns off the backlight
     lcd.begin(20, 4);
     lcd.clear();
     digitalWrite(lcdPin, LOW);
 }

 void alertLcd() {
     // Turns on the LCD backlight and notifies user of motion
     lcd.setCursor(2, 1);
     digitalWrite(lcdPin, HIGH);
     lcd.print("INTRUDER ALERT!!");
     lcd.setCursor(2, 2);
     lcd.print("================");
 }

 void loop() {
     // Main execution loop

     if(digitalRead(pirPin) == HIGH) {
         // flash an alert a few times
         for (int c = 0; c < flashCnt; c++) {
             alertLcd();
             delay(flashDelay);
             lcd.clear();
             delay(flashDelay);
         }

         if(lockLow) {
             // makes sure we wait for a transition to LOW before any further
             // output is made:
             lockLow = false;
             delay(50);
         }
         takeLowTime = true;
     }

     if (digitalRead(pirPin) == LOW) {
         clearLcd();

         if (takeLowTime) {
             // save the time of the transition from high to LOW
             // make sure this is only done at the start of a LOW phase
             lowIn = millis();
             takeLowTime = false;
         }

         // if the sensor is low for more than the given pause,
         // we assume that no more motion is going to happen
         if (!lockLow && millis() - lowIn > pause) {
             // makes sure this block of code is only executed again after
             // a new motion sequence has been detected
             lockLow = true;
             delay(50);
         }
     }
 }

Django-Tracking 0.3.5

I've finally gotten around to looking at a bunch of tickets that had been opened for django-tracking in the past year and a half or so. I feel horrible that it's really taken that long for me to get to them! Every time I got a ticket notification, I told myself, "Okay, I'll work on that this weekend." Many have weekends have passed without any work on any of my projects. I'm going to get better about that!

Anyway, several fixes have gone into the latest version of django-tracking. Some have to do with unicode problems (thanks ramusus!). Others have to do with overall performance, while yet others have to do with overall stability.

The first interesting change in this release is that django-tracking no longer relies on the GeoIP Python API. Instead it's now using django.contrib.gis.utils.GeoIP. I had hoped that this would remove the dependency on the GeoIP C API, but it appears that I was mistaken. Oh well.

Perhaps the biggest improvement in this new release is the use of caching. With caching in place, the middleware classes don't slam the database nearly as badly as they used to. There's still more that could be done with caching to improve performance, but I think what I've got now will be a big help.

Another noteworthy change, in my opinion, is the use of logging. I've sprinkled mildly useful logging messages throughout the code so you can learn when something bad happens that is silently handled. I hope that this will help me improve the quality of the code as it will allow anyone who uses the project (and pays attention to the log messages, of course) to tell me when bad things are happening.

Finally, the packaging code has been updated to be much more simple. Version 0.3.5 has been uploaded to PyPI and is available via pip or easy_install. If you prefer to have the latest copy of the code, the official code repositories are (in order of my personal preference):

I can't wait for your feedback!

Django-Articles 2.1.1 Released

I've been working on some neat changes to django-articles recently, and I've just released version 2.1.1. The most noticeable feature in this release is Auto-Tagging. Since I feel like I've described the feature fairly well in the README, I'll just copy/paste that section here.

The auto-tagging feature allows you to easily apply any of your current tags to your articles. When you save an Article object with auto-tagging enabled for that article, django-articles will go through each of your existing tags to see if the entire word appears anywhere in your article's content. If a match is found, that tag will be added to the article.

For example, if you have tags "test" and "art", and you wrote a new auto-tagged Article with the text:

This is a test article.

django-articles would automatically apply the "test" tag to this article, but not the "art" tag. It will only apply the "art" tag automatically when the actual word "art" appears in the content.

Auto-tagging does not remove any tags that are already assigned to an article. This means that you can still add tags the good, old-fashioned way in the Django Admin without losing them. Auto-tagging will only add to an article's existing tags (if needed).

Auto-tagging is enabled for all articles by default. If you want to disable it by default (and enable it on a per-article basis), set ARTICLES_AUTO_TAG to False in your settings.py file.

Auto-Tagging does not attempt to produce any keywords that magically represent the content of your articles. Only existing tags are used!!

I sure had fun programming this little feature. I know it will be particularly useful for my own site.

Another item I'd like to mention about this release: I've finally started using South migrations in this app. This is a move I've been planning to make for quite some time now.

Head on over to http://bitbucket.org/codekoala/django-articles or use pip install -U django-articles (or easy_install django-articles if you must)! Enjoy!

Vim Tip: Global Delete

Today I was asked to help debug a problem with our product's patcher. All of the debug information for the entire product goes into a single log file, and some processes are quite chatty. The log file that contained the information I was interested in for the patcher problems was some 26.5MB by the time I got it.

All of the lines I was interested in were very easy to find, because they contained specific strings (yay). The problem was that they were scattered throughout the log, in between debug output for other processes. At first, I tried to just delete lines that were meaningless for me, but that got old very quickly. This is how I made my life easier using Vim.

It's possible to do a "global delete" on lines that don't contain the stuff you are interested in. The lines I wanted to see contained one of two words, but I'll just use foo and bar for this example:

:g!/\v(foo|bar)/d

This command will look for any line that does not contain foo or bar and delete it. Here's the breakdown:

  • :g - This is the command for doing some other command on any line that matches a pattern
  • ! - Negate the match (perform the pending command on any line that does not contain the pattern)
  • /\v(foo|bar)/ - The regular expression pattern
    • \v - Use of \v means that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning (very magic). Basically, it removes the need to escape almost everything in your regex.
    • (foo|bar) - Find either foo or (|) bar
  • d - The command to perform on matching lines, which is delete in this case

So, executing that command in the Vim window with the log file wiped out all of the lines that didn't have my magical keywords in them.

When I showed my co-worker how awesome Vim was, he was mildly impressed, and then he asked, "What about multiline log messages?" My particular case didn't have any multiline messages, but I wanted to figure it out anyway. I haven't been able to figure out an exact method for deleting the lines that don't match, but I have found a way to show only the lines that match:

:g!/\v^".+(foo|bar)\_.{-}^"/p

This command is pretty close to the previous one.

  • :g - Global command on lines that match a pattern
  • ! - Negate the match (seems a little backward this time)
  • /\v^".+(foo|bar)\_.{-}^"/ - The regular expression pattern
    • \v - Very magic
    • ^" - Find a line that starts with a double quote ("). Each of our individual log messages starts with a double quote that is guaranteed to be at the beginning of the line, so this is specific to our environment.
    • .+ - One or more characters between the " and foo or bar
    • (foo|bar) - Find either foo or (|) bar
    • \_.{-}^" - Non-greedy multiline match. Matches any character, including newlines (because of the \_), and continues matching until it reaches the next line that begins with ^". Again, that double quote is specific to our environment. The {-} is what makes this a "non-greedy" match--it's like using *, but it matches matches as few as possible of the preceding atom.
  • p - The command to perform on matching lines, which is print in this case. This brings up a separate little window that displays each match (which is why I mentioned the negation seemed a bit backward to me). Navigation and whatnot in this window appears to be similar to less on the command line.

And there you have it! I hope you find this information as useful as it has been for me!

Quick And Easy Execution Speed Testing

There have been many times when I've been programming, encounter a problem that probably involves a loop of some sort, and I think of two or more possible ways to achieve the same end result. At this point, I usually think about which one will probably be the fastest solution (execution-wise) while still being readable/maintainable. A lot of the time, the essentials of the problem can be tested in a few short lines of code.

A while back, I was perusing some Stack Overflow questions for work, and I stumbled upon what I consider one of the many hidden jewels in Python: the timeit module. Given a bit of code, this little guy will handle executing it in several loops and giving you the best time out of three trials (you can ask it to do more than 3 runs if you want). Once it completes its test, it will offer some very clean and useful output.

For example, today I encountered a piece of code that was making a comma-separated list of an arbitrary number of "%s". The code I saw essentially looked like this:

",".join(["%s"] * 50000)

Even though this code required no optimization, I thought, "Hey, that's neat... I wonder if a list comprehension could possibly be any faster." Here's an example of the contender:

",".join(["%s" for i in xrange(50000)])

I had no idea which would be faster, so timeit to the rescue!! Open up a terminal, type a couple one-line Python commands, and enjoy the results!

$ python -mtimeit 'l = ",".join(["%s"] * 50000)'
1000 loops, best of 3: 1.15 msec per loop
$ python -mtimeit 'l = ",".join(["%s" for i in xrange(50000)])'
100 loops, best of 3: 3.23 msec per loop

Hah, the list comprehension is certainly slower.

Now, for other more in-depth tests of performance, you might consider using the cProfile module. As far as I can tell, simple one-liners can't be tested directly from the command line using cProfile--they apparently need to be in a script. You can use something like:

python -mcProfile script.py

...in such situations. Or you can wrap function calls using cProfile.run():

import cProfile

def function_a():
    # something you want to profile

def function_b():
    # an alternative version of function_a to profile

if __name__ == '__main__':
    cProfile.run('function_a()')
    cProfile.run('function_b()')

I've used this technique for tests that I'd like to have "hard evidence" for in the future. The output of such a cProfile test looks something like this:

3 function calls in 6.860 CPU seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    6.860    6.860 <string>:1(<module>)
     1    6.860    6.860    6.860    6.860 test_enumerate.py:5(test_enumerate)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

This is useful when your code is calling other functions or methods and you want to find where your bottlenecks are. Hooray for Python!

What profiling techniques do you use?

Whoa! Another Reason To Love Vim

I've been struggling with some misconfigured appliances at work for the past couple of days, and I was getting tired of manually diff-ing things. On a whim, I decided to ask Google if there is a better way. Turns out there is, and it uses what I already know and love: VIM. Here's a command that lets you diff two remote file using vimdiff:

vimdiff scp://user@host//path/to/file scp://user@otherhost//path/to/file

This is going to save me so much time! I hope it is as useful to you all as it is to me.

SVN Commits By User

The other day at work, I found myself needing to see a list of Subversion commits by a specific user. I spent a few minutes looking at the svn log help, but nothing seemed to be designed to show commits by user. It took me a while to find something to do the trick, but this is it:

svn log | sed -n '/username/,/-----$/ p'

Gotta love sed!

Selenium Unit Test Reuse

Yesterday, one of the QA guys at work approached me with a question that turned out to be much more interesting to me than I think he had planned. He's been doing some unit testing using Selenium, exporting his test cases to Python. His question was this: how can I run the same unit tests using multiple browsers and multiple target servers?

I'm pretty sure he expected a simple 3-step answer or something like that. Instead, he got my crazy wide-eyed "ohhh... that's something I want to experiment with!" look. I started rambling on about inheritance, dynamic class creation, and nested for loops. His eyes started to look a little worried. He didn't really appreciate the nerdy lingo that much. I told him to pull up a chair and get comfortable.

Since I already had some other work I needed to pay attention to, I didn't want to spend too much time trying to figure out a good way to solve his problem. After about 20 minutes of devilish chuckles and frantic rustling through Python documentation, I came up with the following code:

from types import ClassType
from selenium import selenium
import unittest

IPS = ['192.168.0.1', '192.168.0.2']
BROWSERS = ['safari', 'chrome']

class SomeUnitTest(object):

    def test_something(self):
        sel = self.selenium
        # test code

def main(base):
    suites = []
    results = unittest.TestResult()

    for iidx, ip in enumerate(IPS):
        for bidx, browser in enumerate(BROWSERS):
            def setUp(self):
                self.verificationErrors = []
                self.selenium = selenium("localhost", 4444, "*%s" % self.browser, "http://%s/" % self.ip)
                self.selenium.start()

            def tearDown(self):
                self.selenium.stop()
                self.assertEqual([], self.verificationErrors)

            ut = ClassType('UT_%i_%i' % (iidx, bidx), (unittest.TestCase, base), {'ip': ip, 'browser': browser})
            ut.setUp = setUp
            ut.tearDown = tearDown

            suites.append(unittest.TestLoader().loadTestsFromTestCase(ut))

    unittest.TestSuite(suites)(results)
    for obj, error in results.errors:
        print 'In: ', obj
        print error

if __name__ == "__main__":
    main(SomeUnitTest)

I know, I know... it's got some dirty rotten tricks in it, and there are probably more efficient ways of doing what I've done. If the code offends you, look up at my previous disclaimer: I had other things I needed to be working on, so I didn't spend much time refining this. One thing I'm almost certain could be done better is not monkey patching the dynamic classes with the setUp and tearDown methods. Also, the output at the end of the test execution could definitely use some love. Oh well. Perhaps another day I'll get around to that.

Basically, you just set the servers you need to test and the browsers you want Selenium to run the tests in. Those are at the top of the script: IPS and BROWSERS. Then a new unittest.TestCase class is created for each combination of IP/server+browser. Finally, each of the test cases is thrown into a TestSuite, and the suite is processed. If there were any errors during the tests, they'll be printed out. We weren't really concerned with printing out other information, but you can certainly make other meaningful feedback appear.

Anyway, I thought that someone out there might very well benefit from my little experiment on my co-worker's question. Feel free to comment on your personal adventures with some variation of the code if you find it useful!

More django-articles Updates

I've spent a little more time lately adding new features to django-articles. There are two major additions in the latest release (2.0.0-pre2).

  • Article attachments
  • Article statuses

That's right folks! You can finally attach files to your articles. This includes attachments to emails that you send, if you have the articles from email feature properly configured. To prove it, I'm going to attach a file to this article (which I'm posting via email).

Next, I've decided that it's worth allowing the user to specify different statuses for their articles. One of the neat things about this feature is that if you are a super user, you're logged in, and you save an article with a status that is designated as "non-live", you will still be able to see it on the site. This is a way for users to preview their work before making it live. Out of the box, there are only two statuses: draft and finished. You're free to add more statuses if you feel so inclined (they're in the database, not hardcoded).

The article status is still separate from the "is_active" flag when saving an article. Any article that is marked as inactive will not appear on the site regardless of the article's "status".

On a slightly less impressive note (although still important), this release includes some basic unit tests. Most of the tests currently revolve around article statuses and making sure that the appropriate articles appear on the site.