Exponentially weighted averaging for rates

In a previous post, I talked about how to produce exponentially weighted averages of time-embedded values. This is fine for cases where you really want averages, but it doesn't work for rates. Happily, the technique I presented for simple averages can be easily extended to work for rates as well.

To recap a bit, an on-line algorithm for computing a simple average works by accumulating the number of observations and the sum of the observations. There is no mention of time. For each observation $(x, t)$ we update our state $(s, w)$ according to
\begin{eqnarray*}
s_{n} &=& s_{n-1} + x \\
w_{n} &=& w_{n-1} + 1
\end{eqnarray*}
The exponentially weighted average approach that I mentioned before uses the time of each observation to discount the previous state based on the time of the previous observation. Thus, the state $(s,w,t)$ is updated according to
\begin{eqnarray*}
\pi &=& e^{-(t-t_{n-1})/\alpha} \\
s_{n} &=& \pi s_{n-1} + x \\
w_{n} &=& \pi w_{n-1} + 1 \\
t_{n} &=& t
\end{eqnarray*}
If we were to compute simple rates without discounting, then instead of interpreting $w$ as a count, we would interpret it as a time interval. Thus we would update state $(s,w,t)$ according to with an observation $(x, t)$ according to
\begin{eqnarray*}
s_{n} &=& s_{n-1} + x \\
w_{n} &=& w_{n-1} + (t - t_n) \\
t_n &=& t
\end{eqnarray*}
By analogy with the discounted version of the simple average, we can produce an exponentially weighted rate average by updating the state according to this
\begin{eqnarray*}
\pi &=& e^{-(t-t_{n-1})/\alpha} \\
s_{n} &=& \pi s_{n-1} + x \\
w_{n} &=& \pi w_{n-1} + (t-t_n) \\
t_{n} &=& t
\end{eqnarray*}
This approach has a defect in that each data pont should really be considered to be a report of events spread out in an uncertain way over the time since the last report. As such, the interval and the events should probably both be discounted as if they had been reported uniformly over the period in question. In practice, this correction does not matter much since the averaging time constant $\alpha$ is typically large compared to the average reporting interval. Another consideration comes up when multiple sources are reporting concurrently. If we want to do proper discounting of each observed number, then we have to keep track of the time since last report for each of the sources. Since this probably won't matter and since it considerably complicates the implementation, I would rather not do it.

See https://issues.apache.org/jira/browse/MAHOUT-634 for code. This will be in Mahout before long.

3 comments:

Dmitriy Lyubimov said...: is it $$\pi=e^{{t_{n-1}-t}/{\alpha}}$$ perhaps, assuming $$\alpha$$ usually chosen >0?

I think your formula assumes $$\alpha<0$$?; April 8, 2011 at 8:36 AM
Ted Dunning ... apparently Bayesian said...: Yes. I am very prone to dropping negative signs. I prefer to say it as $\pi = \exp(-\frac{-(t-t_{n-1})}{\alpha})$ to keep the difference in times positive, but this is equivalent to what you suggest.

And yes, $\alpha \le 0$ is not useful.; April 8, 2011 at 9:15 AM
Dmitriy Lyubimov said...: Also, combiner implementation really stalls with this approach and produces high errors.

Much more consistent MR combining of this rate summarizer seems to entail if we use function average for time weight instead of just throwing in last interval t_n - t_n-1.

Final udpate for the time weight in this situation would look like $$w_{n+1}=\pi_{n+1}w_{n}+\alpha\cdot(1-\pi_{n+1})$$.

Consistent combiner then would result by simply $$w=\max\left(w_{1},\pi w_{2}\right)$$ assuming t_1> t_2.; June 1, 2011 at 6:56 PM

Surprise and Coincidence - musings from the long tail

Thursday, March 24, 2011