Top > Tim Post's Tomorrow > Ned Batchelder's blog

Competition inside corporations?.

Having observed Hewlett-Packard from the inside for almost 18 months now, I'm struck by a paradox: our economy is a chaotic marketplace of capitalist competition, practiced and championed by corporations, but internally, companies are run as top-down, centrally-planned dictatorships. Why is that? Why isn't a company simply a microcosm of the larger economy?

Take the case of IT services: inside HP, there is a large IT organization, and they provide services to the rest of the company. When my group joined HP, we had no choice about how to get, for example, email service. The IT group provided email, and we used it. When we need to buy a laptop, there is one group that provides that service. When we need servers hosted, we have only one place to turn.

I'm sure the reason for this is the efficiency gained by eliminating redundancy. If there were two groups providing email services, surely one group could do the job of both, with less total staff, equipment, and so on.

That's certainly true, but then why don't we apply the same logic to the larger economy? After all, HP's email group has a huge overlap with Dell's, IBM's, Sun's, Microsoft's, and so on. Couldn't our economy gain by eliminating the overlap? When these questions are considered at the national level, we tout the increased efficiency produced by competition. The economy as a whole gains from the pressure competition puts on each company. Without competition, there is no incentive to improve, no reason to do your best. In a centrally-planned nationalized economy, incompetence is not punished, incentives are mis-aligned, and apathy takes over. There's no reason to improve because your customers have nowhere else to turn, poor service will not lead to loss of business, there's no price pressure, and your existence is guaranteed by the state.

That's logic that every capitalist believes, and we laugh at economies that have tried central planning and failed. So why doesn't the same logic hold inside companies? Why are monopolies and lack of competition not just accepted, but enforced? Don't we believe the same forces will be at work? Is there any compelling reason to improve if you have no competition?

Why couldn't a company have three IT groups (call them Red, Green, and Blue). Each is separate, and lives or dies based on their ability to attract business from the rest of the company. When my group needs servers hosted, we shop around. Maybe Red is the deluxe service, and Blue is economy, and we've heard from friends that Green has the best service. For whatever reason, we choose one of them, and spend our internal dollars with them. The groups will compete, and that competition will force them to optimize and find the best solutions for their customers. If they don't, they will go out of business.

I know it seems wasteful to have all that going on inside a company. There will be duplication. But remember the capitalist logic: without competition, there's no reason to do your best. Just as with the larger economy, the duplication will be worth it because of the increased efficiency forced by competition. And without competition, your only option will be a poor one.

Of course, not all work inside corporations could be run this way. For example, legal departments deal with the outside world, and the corporation must speak with one voice there. But couldn't competition be used in at least some parts of large companies?

Where's the flaw in this logic? Why isn't competition inside corporations a good idea?

Honda Civic hybrid.

I've just bought a new car: a Honda Civic hybrid. I don't buy cars that often. The car I just replaced was a 1994 Civic. To keep the same pace, I'll add an entry to my calendar for 2022 to buy my next car.

I like the Civic for its gas mileage, 45 mpg highway. The extra expense over a non-hybrid Civic is actually more than I'll save on gas over the life of the car, but I like being the change I want to see in the world.

One thing that surprised me about this car is how familiar it felt after having driven a 1994 Civic. Lots of extra bells and whistles that I'd gotten used to in my wife's larger cars are still absent in this car.

Features in the hybrid I didn't have in my 1994 Civic (other than the hybrid engine):

  • A temperature setting in the climate control
  • Front seat map lights
  • A chime to alert me that I've left my headlights on
  • An auxilliary jack for the stereo
  • Electronic dashboard with thermometer, etc

Things that work in the hybrid that used to work in the 1994 Civic, but no longer do:

  • Remote entry buttons
  • Reliable low-speed wipers
  • Rear left passenger door handle
  • Exhaust system. The last thing that failed on the 94 was the exhaust. For its last two days, it sounded like a four-door Harley.

Fancy features the Hybrid doesn't have that my wife's car does:

  • Motorized seat adjustments with memory
  • Heated seats
  • Lighted mirrors in visors
  • Fold-in side mirrors
  • Leather seats
  • Separate temperature settings for driver and passenger
  • Individual lights for rear passengers

I'm pleased to have a new car that just works, and especially one that does so well on gas.

Evil apple.

I really don't know what Apple is thinking. First they release a really cool phone, good. Then they release an SDK for it, also good. But developers aren't allowed to talk to each other about developing for the phone. That's bad, doesn't Apple realize how developers learn? Then Apple sets up a store and keeps control over what apps can be sold there. Partly good (no malware can pollute the ecosystem), but partly bad (no one knows how Apple will decide what can be sold).

Then Apple started to reject apps from the app store, which is bad, because app developers only find out they've been rejected after they've expended all the effort to build the app, and it can be hard to predict whether an app will be rejected or not, making it risky to build iPhone apps.

After this breathtaking descent into cluelessness, Apple has topped itself by deciding that app rejections are subject to the non-disclosure, making it illegal for developers to talk about the fact that their app has been rejected! Is Apple actively trying to discourage app development? Is there any other company that could act this way without raising the ire of the development community? This is the company that used Gandhi in an ad? What exactly is Apple thinking?

Cisco minus t.

One of those simple typos that turns into an embarassing public mistake: Cisco home page FAIL, where (it is theorized) a regex that should have had t had only t, and as a result, all lowercase t's were removed from the page, breaking it completely.

A server memory leak.

We pushed new code to our production servers last week. There were a lot of changes, including our upgrade to Django 1.0. As soon as the servers restarted, they immediately suffered, with Python processes bloated to 2Gb or more memory each. Yikes! We reverted to the old code, and began the process of finding the leak.

These are details on what we (Dave, Peter, and I, mostly them) did to find and fix the problem.

We used Guppy, a very capable Python memory diagnostic tool. It showed that the Python heap was much smaller than the memory footprint of the server process, so the leak seemed to be in memory managed by C extensions.

We identified these C extensions:

We tried to keep these possibilities in mind as we worked through our next steps. PIL and PDFlib in particular seemed likely given how heavily we use them, and because they traffic in large data (high-res images).

We had some unit tests that showed fat memory behavior. We ran valgrind on them hoping they would demonstrate a leak that we could fix. Valgrind is a very heavy-weight tool, requiring re-compiling the Python interpreter to get good results, and even so, we were overwhelmed with data and noise. The tests took long enough to run that other techniques proved more productive.

Our staging server had been running the code for over a week, and showed no ill effects. We tried to reason out what is the important difference between the staging server and the production server? We figured the biggest difference is the traffic they each receive. We tried to load up the staging server with traffic. An aggressive test downloading many dynamic PDFs quickly ballooned the memory on the staging server, so we suspected PDFlib as the culprit.

Closely reading the relevant code, we realized we had a memory leak if an exception occurred:

p = PDF_new()
# Lots of stuff, including an exception
PDF_delete(p)   # Not called: leak!

We felt pretty good about finding that, and fixed it up with a lot of unfortunate try/finally clauses. We put the code on our staging server, and it behaved much better. Lots of PDF downloads would still cause the memory to grow, but when the requests were done, it would settle back down again. So we liked the theory that this was the fix. The only flaw in the theory was it didn't provide a reason why our old code was good and our new code was bad. We put the fixed code on the production server: boom, the app server processes ballooned immediately. Apparently as good as this exception fix was for our PDFlib code, it wasn't the real problem.

We tried chopping out functionality to isolate the problem. Certain subsets of URLs were removed from the URL map to remove the traffic from the server. We ran the code for short five-minute bursts to see the behavior under real traffic, and it was no better. To be sure it wasn't still PDFlib somehow, we tried removing PDFlib by raising an exception at the one place in our code where PDF contexts are allocated. Memory still exploded. We tried removing PIL by writing a dummy Image.py that raises exceptions unconditionally. It didn't help.

We tried logging requests and memory footprints, but correlations elusive. We tried changing the process architecture to use only one thread per process, no luck.

We tried reverting all the Django 1.0 changes, to move back to the Django version we had been using before. This changed back the Django code, and the adaptations we'd made to that code, but (in theory) left in place all of the feature work and bug fixes we had done.

We pushed that to the servers, and everything performed beautifully, the server processes used reasonable amounts of memory, and didn't grow and shrink. So now we know the leak is either in the Django 1.0 code, or in our botched adaptation to it, or in some combination of the two. Many people are using Django 1.0, so it seemed unlikely to be as simple as a Django leak, so we focused on our Django-intensive code.

Now that we'd narrowed it down to the Django upgrade, how to find it? We went back to the request logs, examining them more closely for any clues. We found one innocuous-seeming URL that appeared near a number of the memory explosions.

We took one app server out of rotation, so that it wasn't serving any live requests. Our nginx load balancer is configured so that a URL parameter can direct a request to a particular app server. We used that to hit the isolated app server once with the suspect request. Sure enough, the process ballooned to 1Gb, and stayed there. Then we killed that process, and did it again. The Python process grew to 1Gb again. Yay! We had a single URL that reproduced the problem!

Now we could review the code that handled that URL, and eyeball everything for suspects. We found this:

@memoize()
def getRecentStories(num=5):
    """ Return num most recent stories. Only public stories are returned.
    """
    stories = Story.objects.published(access=kAccess.public).
                exclude(type=kStoryType.personal).
                order_by('-published_date')
    if num:
        stories = stories[:num]
    return stories

Our @memoize decorator here caches the result of the function, based on its argument values. The result of the function is a QuerySet. Most of the code that calls getRecentStories uses a specific num value, so it returns a QuerySet for a small number of stories, and the caller simply uses that value (for example, in a template context variable).

However, in this case, the getRecentStories function is called like this:

next_story = getRecentStories(0).filter(published_date__lt=the_date)[0]

The QuerySet is left unlimited until after it is filtered by published_date, and then the first story is limited off.

Now we're getting to the heart of one of our mysteries: why was the old Django code good, and the new Django code bad? The Django ORM changed a great deal in 1.0, and one of the changes was in what happened when you pickle a QuerySet.

To cache a QuerySet, you have to pickle it. Django's QuerySets are lazy: they only actually query the database when they need to. For as long as possible, they simply collect up the parameters that define the query. In Django 0.96, pickling a QuerySet didn't force the query to execute, you simply got a pickled version of the query parameters. In Django 1.0, pickling the query causes it to query the database, and the results of the query are part of the pickle.

Looking at how the getRecentStories function is called, you see that it returns a QuerySet for all the public stories in the database, which is then narrowed by the caller first on the published_date, but more importantly, with the [0] slice.

In Django 0.96, the query wasn't executed against the database until the [0] had been applied, meaning the SQL query had a "LIMIT 1" clause added. In Django 1.0, the query is executed when cached, meaning we request a list of all public stories from the database, then cache that result list. Then the caller further filters the query, and executes it again to get just one result.

So in Django 0.96, this code resulted in one query to the database, with a LIMIT 1 clause included, but in Django 1.0, this code resulted in two queries. The first was executed when the result was cached by the @memoize decorator, the second when that result was further refined in the caller. The second query is the same one the old code ran, but the first query is new, and it returns a lot of results because it has no LIMIT clause at all.

The fix to reduce the database query was to split getRecentStories into two functions: one that caches its result, and is used when the result will not be filtered further, and another uncached function to use when it will be filtered:

def getRecentStories(num=5):
    """ Return num most recent stories. Only public stories are returned.
        Use this function if you want to filter the results yourself.
        Otherwise use getCachedRecentStories.
    """
    stories = Story.objects.published(access=kAccess.public).
                exclude(type=kStoryType.personal).
                order_by('-published_date')
    if num:
        stories = stories[:num]
    return stories
    
@memoize()
def getCachedRecentStories(num=5):
    """ Return num most recent stories. Only public stories are returned.
        If you need to filter the results further, use getRecentStories.
    """
    return list(getRecentStories(num=num))

One last point about the Django change: should we have known this from reading the docs? Neither the QuerySet refactoring notes nor the 1.0 backwards incompatible changes pages mention this change, or address the question of pickled QuerySets directly. Interestingly, an older version of the docs does describe this exact behavior. This changes was explicitly made and discussed, but seems to have been misplaced in the 1.0 doc refactoring. Of course, we may not have realized we had this behavior even if we had read about the change.

So we've found a big difference in the queries made using the old code and the new code. But why the leak? The theory is that MySQLdb has a leak which has been fixed on its trunk. Looking at the MySQLdb code, it's pretty clear that they've been developing for a while since releasing version 1.2.2. Unfortunately, the MySQLdb trunk doesn't work under Django yet, so we can't verify the theory that MySQLdb is the source of the leak.

Ironically, MySQLdb was not on our list of C extensions to look at. If it had been, we might have identified it as the culprit with a Google search. Since the MySQLdb trunk doesn't work under Django, I guess we would have hacked MySQLdb or Django to get them to work together. We would have run leak-free, but would be unknowingly executing the giant database query.

The last mystery: why didn't the problem appear on our staging server? Because it was running with a much smaller database than our production servers, so the "all public stories" query wasn't a big deal. We learned a lesson there: sometimes subtle difference can make all the difference. We need to keep the staging server's database as current as we can to make sure it's replicating the production environment as much as possible. It's impossible to make them identical (for example, the staging server doesn't get traffic from search bots), but at times like this, it's important to understand what all the differences are, and minimize them where you can.

Switching python versions on windows.

I forget what software first set up these associations, but I have .py files registered with Windows so that they can execute directly. The registry defines .py as a Python.File, which has a shell open command of:

"C:\Python24\python.exe" "%1" %*

My PATHEXT environment variable includes .py, so the command prompt will attempt to execute .py files, using the registry associations to find the executable.

But: I wanted to switch from Python 2.4 to Python 2.5. That meant updating the registry in a handful of places. A Python script to the rescue!

""" Change the .py file extension to point to a different
    Python installation.
"""
import _winreg as reg
import sys

pydir = sys.argv[1]

todo = [
    ('Applicationspython.exeshellopencommand',
                '"PYDIR\\python.exe" "%1" %*'),
    ('Applicationspythonw.exeshellopencommand',
                '"PYDIR\\pythonw.exe" "%1" %*'),
    ('Python.CompiledFileDefaultIcon',
                'PYDIR\pyc.ico'),
    ('Python.CompiledFileshellopencommand',
                '"PYDIR\\python.exe" "%1" %*'),
    ('Python.FileDefaultIcon',
                'PYDIR\py.ico'),
    ('Python.Fileshellopencommand',
                '"PYDIR\\python.exe" "%1" %*'),
    ('Python.NoConFileDefaultIcon',
                'PYDIR\py.ico'),
    ('Python.NoConFileshellopencommand',
                '"PYDIR\\pythonw.exe" "%1" %*'),
    ]

classes_root = reg.OpenKey(reg.HKEY_CLASSES_ROOT, "")
for path, value in todo:
    key = reg.OpenKey(classes_root, path, 0, reg.KEY_SET_VALUE)
    reg.SetValue(key, '', reg.REG_SZ, value.replace('PYDIR', pydir))

Invoke this with your desired Python installation directory, and the registry is updated to point to it.

Note that this doesn't affect what the command Python means, that's determined by your PATH enviroment variable. These registry settings change which Python executable is found when you invoke a .py file as a command.

Python registry grepper.

In writing the python registry switcher, I needed to search the registry for references to my old Python version. Another good use for a Python script:

""" Search the Windows registry.
"""

import _winreg as reg
import itertools

RegRoots = {
    reg.HKEY_CLASSES_ROOT:   'HKEY_CLASSES_ROOT',
    reg.HKEY_CURRENT_USER:   'HKEY_CURRENT_USER',
    reg.HKEY_LOCAL_MACHINE:  'HKEY_LOCAL_MACHINE',
    reg.HKEY_USERS:          'HKEY_USERS',
    }

class RegKey:
    """ A handy wrapper around the raw stuff in the _winreg module.
    """
    def __init__(self, rawkey, root, path):
        self.key = rawkey
        self.root = root
        self.path = path
        
    def __str__(self):
        return "%s\\%s" % (RegRoots.get(self.root, hex(self.root)), self.path)
    
    def close(self):
        reg.CloseKey(self.key)

    def values(self):
        """ Enumerate the values in this key.
        """
        for ikey in itertools.count():
            try:
                yield reg.EnumValue(self.key, ikey)
            except EnvironmentError:
                break

    def subkey_names(self):
        """ Enumerate the names of the subkeys in this key.
        """
        for ikey in itertools.count():
            try:
                yield reg.EnumKey(self.key, ikey)
            except EnvironmentError:
                break
        
    def subkeys(self):
        """ Enumerate the subkeys in this key.
        """
        for subkey_name in self.subkey_names():
            if self.path:
                sub = self.path + '\' + subkey_name
            else:
                sub = subkey_name
            yield OpenRegKey(self.root, sub)

def OpenRegKey(root, path):
    try:
        rawkey = reg.OpenKey(root, path)
    except Exception, e:
        #print "Couldn't open %r %r: %s" % (root, path, e)
        return None
    return RegKey(rawkey, root, path)

def grep_key(key, target):
    for name, value, typ in key.values():
        if isinstance(value, basestring) and target in value:
            print "%s\\%s = %r" % (key, name, value)

    for subkey in key.subkeys():
        if not subkey:
            continue
        grep_key(subkey, target)
        subkey.close()

def grep_registry(args):
    for root in RegRoots.keys():
        grep_key(OpenRegKey(root, ""), args[1])

if __name__ == '__main__':
    import sys
    grep_registry(sys.argv)

Most of this is a pythonic wrapper around the _winreg module, with a few simple functions at the end to actually search the registry.

Aptus 2.0.

Aptus 2.0, the latest version of my Mandelbrot explorer, is now available. It's got a lot of improvements over the previous version, including speed improvements, multiple top-level windows, tool windows for displaying information and Julia set support.

fractal image from the Mandelbrot set

It's built with wxPython, so it runs on Windows, Linux, and Mac.

Five thirty eight.

We are in full swing now in the presidential campaign, and we are constantly bombarded with poll numbers. Funny thing is, most of those polls are just national polls, a prediction of how the nation-wide popular vote will turn out. But as the 2000 election underscored, that doesn't matter at all: what matters is the electoral vote. To predict that, you'd have to track individual state-by-state polls to see who wins the popular vote in each state, and compute the electoral vote totals. Sounds like a lot of work, but FiveThirtyEight.com (Electoral Predictions Done Right) has done all the work already. They also run statistical simulations to predict the likelihood of various outcomes (for example: the chance of McCain losing the popular vote but winning the election is 1.7%).

Add extensive tables of data detailing the poll data, the simulations, their predictions, maps of outcomes, more of the same for congressional races, and so on, and you have a quantitative political junkie's dream site.

BTW, as of this moment, they predict an Obama win, with 339 electoral votes to McCain's 199.

And they aren't the only game in town: there's also Electoral-vote.com (currently predicting a 329 over 194 win for Obama), and Election Projection (364 to 174 for Obama).

3 down, 47 to go.

Connecticut has joined the ranks of states allowing gay marriage, good for them. The process was similar to Massachusetts and California: couples sue for the right to marry, eventually the state Supreme Court finds that either existing laws don't preclude gay marriage, or the state constitution won't allow distinguishing between straight and gay couples. I for one am glad. I believe that eventually this will be accepted across the country, and people will wonder what the fuss was about. Those predicting the downfall of society will be proven wrong. We continue to have thriving families here in Massachusetts even after four years of gay marriage.

For a vibrant "debate" on the issue, check out the comments on Hot Air's post about the news. The post itself, while disagreeing with the decision, does a good job analyzing the legal arguments in it. The comments, though, consist mostly of people hurling invective at each other, no one being swayed by either sides' arguments.

This decision will bring the usual complaints of judicial activism (actually, they were interpreting the constitution, that's their job), the collapse of morality (how exactly?), harm to families (by creating more of them? I don't get it), the disenfranchisement of the people (the whole point of judges is to decide independently of public opinion) and so on. To all of them I say, open your eyes and close your mouths. Everything is fine. The boogey-man of gay marriage simply doesn't exist.

      Click here to see the XML version of this information.
10/12/2008; 9:38:29 PM Eastern.
Refresh.