Thursday, March 24, 2011

Update on exponential time-embedded averages

I will be adding this code to Mahout shortly.

See https://issues.apache.org/jira/browse/MAHOUT-634 for status.

Also, if you are measuring rates, then it is common for rates to be reported from multiple sources independently.  Such an average can be computed pretty easily using this same framework if the sources report often relative to the averaging time constant.  This simple implementation just attributes each reported count as if they occurred in the interval since the most recent report from any reporter.  If the time constant is relatively long, this can work out reasonably well as long as we are careful.

If reporting intervals are longer, then the averaging is a bit trickier because we really would like to attribute the reported counts over the entire interval from the last report from the same source.  This means that we have to discount some of the counts because they are effectively kind of old.

More details shortly.

No comments: