The trick is that the measurements are only available at irregular intervals. If they were sampled regularly, then the standard mixing trick would work:

\[

m_{n+1} = \mu m_n + (1-\mu) x_{n+1}

\]

where $m$ is our current estimate of the mean, $x_n$ is the $n$-th sample and $\mu$ determines how much history to use.

With unequal sample times, things become a bit more complicated. If we get lots of measurements all at once, we want to give them nearly equal weight but if we have a long gap, we want to weight the very old samples much less.

In fact, we want to weight old samples according to how old they are with exponentially decreasing weight. If we sample values $\left \lbrace x_1 \ldots x_n \right \rbrace$ at times $t_1 \ldots t_n$ then we want the weighted mean defined by

\[

m_n = {\sum_{i=1}^n x_i e^{-(t_n - t_i)/\alpha} \over \sum_{i=1}^n e^{-(t_n - t_i)/\alpha} }

\]

Here $\alpha$ plays the same role as $\mu$ did before, but on a different scale. If the evenly sampled data comes at time intervals $\Delta t$ then $\mu = e^{\Delta t / \alpha}$.

Happily, there is a very simple recurrence relationship that allows us to keep only two intermediate values while computing the value of $m_1 \ldots m_n$ in an entirely on-line fashion as the $x_i$ values arrive.

To see this, define

\begin{eqnarray*}

\pi_n &=& e^{-(t_{n+1}-t_n)/\alpha} \\

w_{n+1} &=&

\sum_{i=1}^{n+1} e^{-(t_{n+1} - t_i)/\alpha} =

1+e^{-(t_{n+1}-t_n)/\alpha} \sum_{i=1}^{n} e^{-(t_{n} - t_i)/\alpha} \\

& =& 1 + \pi w_n\\

s_{n+1} &=&

\sum_{i=1}^{n+1} x_i e^{-(t_{n+1} - t_i)/\alpha} =

x_{n+1}+e^{-(t_{n+1}-t_n)/\alpha} \sum_{i=1}^{n} x_i e^{-(t_{n} - t_i)/\alpha} \\

&=& x_{n+1} + \pi_n s_n

\end{eqnarray*}

Then note that

\[

m_{n+1} = {s_{n+1} \over w_{n+1}}

\]

This leads naturally to a procedure that has state consisting of $t, w, m$ which are updated with using new values of $t_n, x_n$ according to

\begin{eqnarray*}

\pi &=& e^{-(t_{n}-t)/\alpha} \\

w &=& 1 + \pi w \\

s &=& x_n + \pi s \\

m &=& {s \over w} \\

t &=& t_{n}

\end{eqnarray*}

Isn't that a kick!

To do this right, however, we need a test. Here are some data vectors computed for $\alpha=5$:

t x pi w s m

1 11.35718 1.5992071 1.0000000 1.000000 1.5992071 1.5992071

2 21.54637 -1.3577032 0.1303100 1.130310 -1.1493105 -1.0168100

3 28.91061 -0.3405638 0.2292718 1.259148 -0.6040683 -0.4797436

4 33.03586 0.7048632 0.4382129 1.551775 0.4401527 0.2836447

5 39.57767 0.3020558 0.2702621 1.419386 0.4210124 0.2966159

## 7 comments:

Ted, pardon my bad math, but I dont understand how you are calculating pi in your example. I can't get to .13031 no matter what I do. ln .13031 is -2 ish, which doesn't seem to come from t1 - t0, which is about 10 - what am I missing?

Thanks,

Nathan

Nathan,

The issue is that I dropped a division by α in the expression for $\pi$. For the example, α=5 so for the second line, $\pi = \exp{-(21.54637-11.35718) \over 5} = e^{-2.04} = 0.1303101$

And, by the way, thanks for noticing that. There was also a missing minus sign on an earlier expression that I found as a result of looking for the problem you spotted.

Let me know if you have any other problems with this.

That worked, thank you much. Your CAPTCHA is nearly impossible, BTW. Or maybe I'm a robot and it is just doing its job. :)

Just as an FYI to anyone ending up here, there's a truly excellent paper with many useful online time-series operators for inhomogeneous data:

http://www.thalesians.com/archive/public/academic/finance/papers/Zumbach_2000.pdf

Excellent paper!

very useful article

thanks

this link

http://www.thalesians.com/archive/public/academic/finance/papers/Zumbach_2000.pdf

is really great, too

thanks

Post a Comment