I had a great visit today at the DIMA laboratory at TU in Berlin. They are working on an interesting system called Stratosphere which provides an interesting generalization generalization of map-reduce. Of particular interest is the run-time flexibility for adapting how the flow partitions or transfers data.
They accomplish this by having a lower level abstraction layer that supports a larger repertoire of basic options beyond just map and reduce. These operations include match, cross product and co-group. Having a wider range of operations and retaining some additional flow information at that level allows them to do on-the-fly selection of the detailed algorithm for different operations based on the statistics of the data and the properties of the user-supplied functions.
Here's a pic of me answering questions about startups and log-likelihood ratio tests.