In the third part of my talk, I talked a bit about where Hadoop has come from and where it is going. Importantly, this involves a choice about where Hadoop and the related companies products and individuals might be able to take things.
Where we are and how we got here
My second section described the rough state of the Hadoop eco-system is a slightly provocative way. In particular, I described a time when I was on a British train and in partial compensation for delays the operators announced that "free beer would be on sale in the galley car". Free beer for sale is a wonderful analogy for the recent state of Hadoop and related software.
That said, there are serious problems brewing. The current world of Hadoop is largely based on the assumption that the current community is all that there is. This is a problem, however, because the current (Apache-based) community presumes interaction by individuals with a relatively common agenda. More and more, however, the presence of a fundable business opportunity means that this happy world of individuals building software for the greater good has been invaded by non-human, non-individual corporations. Corporations can't share the same agenda as the individuals involved in Apache and Apache is constitutively unable to allow corporate entities as members.
This means that the current community can no longer be the current world. What we now have is not just a community with shared values but is now an eco-system with different kinds of entities, multiple agendas, direct competition and conflicting goals. The Apache community is one piece of this eco-system.
Our choice of roads
Much as Dante once described his own situation, Hadoop now finds itself in the middle of the road of its life in a dark wood. The members of the Apache community have a large voice in the future of Hadoop and related software.
As a darker option, the community can pretend that the eco-system that now exists of human and corporate participants is really a community. If so, it is likely that the recent problems in moving Hadoop forward will continue and even get worse. Commit wars and factionalization are likely to increase as corporate entities, denied a direct voice in Apache affairs, will tend to gain influence indirectly. Paralysis in development will stall forward progress of Hadoop itself leading to death by a thousand forks. Such a dark world would let alternative frameworks such as Azure to gain footholds and possibly to dominate.
In this brighter alternative future, I think that there are ways to create a larger forum in which corporate voices can be heard in their true form rather than via conflicts of interest. In this scenario, Apache would be stronger because it really can be a strong voice of the open source community. Rather than being the average of conflicting views, Apache would be free to express the shared values of open source developers. Corporations would be able to express their goals, some shared, some not in a more direct form and would not need so much to pull the strings of Apache committers. Importantly, I would hope that Hadoop could become something analogous to a reference implementation and that commercial products derived from Hadoop would have a good way to honor their lineage without finding it difficult to differentiate themselves from the original. Hopefully in this world innovation would be welcomed, but users would be able to get a more predictable experience because they would be able to pick products offering whatever innovation rate/stability trade-off that they desire. Importantly, there would be many winners in such a world since different players would measure success in different terms.
We have a key task ahead of us to define just what kind of eco-system we want. It can be mercenary and driven entirely be corporate goals. This could easily happen if Apache doesn't somehow facilitate the creation of a forum for eco-system discussion. In such an eco-system, it is to be expected that the companies that have shown a strong talent at dominating standards processes and competing in often unethical ways will dominate. My first thought when I imagine such a company is Microsoft, but that is largely based on having been on the receiving end of their business practices. I have no illusions that talent for that kind of work is exclusively found in Redmond.
In my talk, I proposed some colorful cosmological metaphors for possible worlds, but the key question is how we can build a way for different kinds of entities to talk. It is important to recognize different values and viewpoints. Apache members need to understand that not everything is based on individual action, nor do corporation hold the same values. Companies need to take a strong stance to recognize the incredible debt owed to the Apache community for creating the opportunities we all see.
If we can do this, then Hadoop (and off-spring) really does have a potential to dominate business computing.