Monday, November 23, 2009

How to smooth rate estimates

I mentioned in a previous posting that I recommend smoothing rate estimates using offsets on numerator (number of events) and denominator (time period). While it is possible to build a principled prior distribution and do very detailed rate estimation based on this, I find that for "top-100" lists and for "hot-100" lists, it is more useful to adjust the prior with larger strategic goals in mind.

In particular, what I like to do is set the ratio of the offsets to position an item with no data whatsoever (event count = 0, time = 0) so be somewhere in the top half of all items, but generally not in the top 10%. This expresses a general optimism about new items that allows them to achieve high rankings with a very modest burst of enthusiasm from the audience and forces them to provide some proof that they are dogs.

Once this ratio is set, it remains to set the actual magnitudes. This is done by deciding how much many events that you want an item to have or how long you want it to languish before the data overcome the prior. If you want the first 10 events to have equal weight to the prior, set the numerator offset to 10. Alternately, if you want the first day of data to have equal weight to the prior, set the denominator offset to 1 day.

Done. Works. Simple.

I have done the much more complex effort of building detailed prior models and actually estimating rates in a completely principled fashion but I have found two things:

a) business people have other agendas than pure estimation and they want a vote

b) the early estimates are pretty unreliable until the data dominate the prior (duh!). Thus, you may as well set up the prior to make (a) work.

No comments: