Thursday, January 15, 2009

Real-time decision making using map-reduce

Recently Tim Bass described his happiness that the Mahout project has taken on some important tasks that are often applied to real-time decision making.

I think that Tim's joy is justified, although Mahout is still a very new project that is definitely not a finished work by any means. There are a number of important algorithms that are being worked on that could provide some very important capabilities for real-time decision makers.

The fact is, though, that map-reduce is no silver bullet. It won't make problems go away. It is an important technology for large-scale computing that lends it self to the batch training of real-time models if not quite to high availability real time decisioning. For that, I tend to add systems like zookeeper and to build systems in the style of the Katta framework.

In my mind the really important change that needs to happen is that designers of real-time decisioning systems need to embrace what I call scale-free computation.

Scale free computation is what you get when you don't let the software engineers include the scale of the process as a design parameter. Without that knowledge, they have to build systems that will not require changes when the scale changes. Map-reduce is a great example of this because the map and reduce functions are pure functions. The person who writes those functions has no idea how many machines will be used to compute them ... moreover, the framework is free to apply these functions more than once to data if it find a need.

Katta does something similar by allowing the manager of a search system to specify what should happen without specifying quite how. The master system uses Zookeeper to maintain state reliably and it examines the specification of what is desired and allocates tasks to search engines. The search engines detect changes in their to-do list and download index shards as necessary. Clients examine the state in Zookeeper and autonomously dispatch requests to search engines and combine results.

The overall effect again is that the person implementing a new katta search system has very little idea how many machines will be used to run the search. If the corpus is large, then many shards will be used. If throughput is required, then shards will be replicated many times. Neither requires change.

Scale free design patterns like this are required for modern, highly reliable decision systems. The change in mind-set required is non-trivial and dealing with legacy code will be prodigiously difficult. As usual with major changes in technology, the best results will come after the old code is retired. Hopefully that can happen before us old coders retire!