Follow-Up to Weighted Sorting in Python

The activity on my latest blog post has been tremendous! I never expected that much activity within an hour or two of posting the article.

The aim of this article is to provide an alternative solution to my weighted sort when you're after increased performance. It might just be useful to those who came here in search of a way to do weighted sorting in Python. I need to give a shout out to Jeremy Brown for suggesting this solution. He's so awesome :P

While the example I posted in my previous article addressed my needs just fine, it is definitely not the fastest option. A better solution would be to completely remove the special IDs from the object list altogether and just place them in front of the list:

import itertools
import random

object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]

not_special_ids = (i for i in object_ids if i not in special_ids)
for i in itertools.chain(special_ids, not_special_ids):
    # do stuff with each ID
    pass

This solution is quite different from my weighted sort, as there's no sorting going on at all, just a simple generator and using itertools to chain two collections together.

Here's a way you can benchmark see which solution is faster:

from copy import copy
import cProfile
import itertools
import random

object_ids = [random.randint(0, 100) for i in range(20)]
special_ids = [random.choice(object_ids) for i in range(5)]

ITERATIONS = 1000000

def sorting():
    for i in xrange(ITERATIONS):
        l = copy(object_ids)
        l.sort(key=lambda i: int(i in special_ids) * -2 + 1)
        for i in l:
            pass

def chaining():
    for i in xrange(ITERATIONS):
        l = (i for i in object_ids if i not in special_ids)
        for i in itertools.chain(special_ids, l):
            pass

cProfile.run('sorting()')
cProfile.run('chaining()')

Sample output on my box is:

$ python weighted_sort.py
         24000003 function calls in 18.411 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   18.411   18.411 <string>:1(<module>)
  1000000    0.580    0.000    0.580    0.000 copy.py:112(_copy_with_constructor)
  1000000    0.791    0.000    1.510    0.000 copy.py:65(copy)
        1    1.397    1.397   18.411   18.411 weighted_sort.py:11(sorting)
 20000000    8.907    0.000    8.907    0.000 weighted_sort.py:14(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.139    0.000    0.139    0.000 {method 'get' of 'dict' objects}
  1000000    6.597    0.000   15.503    0.000 {method 'sort' of 'list' objects}


         16000003 function calls in 7.381 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.381    7.381 <string>:1(<module>)
        1    2.744    2.744    7.381    7.381 weighted_sort.py:18(chaining)
 16000000    4.636    0.000    4.636    0.000 weighted_sort.py:20(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So, you can see that the chaining solution is easily twice as fast as the sorting solution over 1 million iterations. Both of these solutions work perfectly well for my purposes, and I will probably end up switching to the chaining solution sometime in the future.

EDIT After reading lqc's comment on my previous article, I've decided to update this one with more appropriate benchmarks. The information that lqc has shared makes the speed of these solutions much closer.

Here's my updated test script:

$ python weighted_sort.py
         4000003 function calls in 8.437 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.437    8.437 <string>:1(<module>)
  1000000    0.558    0.000    0.558    0.000 copy.py:112(_copy_with_constructor)
  1000000    0.741    0.000    1.431    0.000 copy.py:65(copy)
        1    1.319    1.319    8.437    8.437 weighted_sort.py:11(sorting)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  1000000    0.133    0.000    0.133    0.000 {method 'get' of 'dict' objects}
  1000000    5.688    0.000    5.688    0.000 {method 'sort' of 'list' objects}


         17000003 function calls in 7.545 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    7.545    7.545 <string>:1(<module>)
        1    2.818    2.818    7.545    7.545 weighted_sort.py:18(chaining)
 17000000    4.726    0.000    4.726    0.000 weighted_sort.py:20(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So if you only gain a second over 1 million iterations, I think I prefer the sort(key=special_ids.__contains__) solution! I hope these two articles will help you get started on your adventures with handling special objects before others!

Comments

Comments powered by Disqus

Meta

Published: March 8, 2011

Author: codekoala

Comments: 0

Word Count: 1,118

Next: Free Professional Web Development

Previous: Simple Weighted Sort in Python

Tags

articles blog open-source performance programming python work

Navigation

Recent Articles

Tag Cloud

adsense  apache  arduino  articles  auto-tagging  bash  bitbucket  blog  breadboard  c  cache  caching  chrome  cisco  command-line  css  database  death  design  desktop  diff  dillon  django  django-articles  django-tracking  documentation  docutils  downtime  driver  easy_install  exec  face-tracking  fedora  feed  firefox  fishing  freelance  fujifilm  git  github  gnome  google  gstreamer  hooks  how-to  howto  html  idiocy  imap4  internet  java  javascript  js  kara  kde  kernel  kurt  lcd  led  linux  logging  mac  macintosh  mail  matt  mercurial  middleware  mindy  mobile  motion  mouse  multiprocessing  network-manager  networking  news  novell  open-source  opencv  opensuse  optimization  osx  packt-publishing  performance  photography  php  picnic  pip  pir  pop3  profile  profiling  programming  projects  pycon  pygments  pypi  python  regex  regular-expression  resistor  rest  restructuredtext  review  rss  ruby  school  scm  scroll  security  sed  selenium  servo  site-wide  slackware  sled  soldering  sparkfun  sphinx  step-by-step  stupid  style  subversion  suse  svn  syndication  templates  terminal  testing  thanks  tips  tornado  tutorial  twitter  unit-testing  unix  updates  utilities  v4l2loopback  vcs  version-control  vim  virtualbox  vista  vpn  web  web-development  webcam  webfaction  windows  wireless  work  wxpython  xorg  xwindows