Exponentially Weighted Moving Average, EWMA

The Exponentially Weighted Moving Average (EWMA) algorithm is the simplest discrete-time low-pass filter. It generates an output $y_i$ in the i-th iteration that corresponds to a scaled version of the current input $x_i$ and the previous output $y_{i-1}$.
$$\boxed{y_i=\left \{ \begin{matrix} \; \; \; \; \; \; 0, \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; \; i=0 \\ \alpha \cdot x_i + (1-\alpha) \cdot y_{i-1}, \; \; \; \; i=1,2.. \end{matrix} \right.}$$
The smoothing factor, $\alpha \in [0,1]$, indicates the normalized weight of the new input in the output. For example, an $\alpha=0.03$ implies that each new input will contribute a 3% to the output, while the previous output will contribute a 97%. The boundary values for the smoothing factor are 0 and 1, which implies $y_i=y_{i-1}$ and $y_i= x_i$, respectively. In the following points, we analyze the algorithm from different points of view.

The EWMA could be considered as an Auto Regressive Moving Average (ARMA) filter because it depends on the history of values from both the input and the output. However, if the EWMA equation is developed, it is possible to represent the current output based only on the contributions of past inputs, i.e. Moving Average (MA) filter.
$$y_i = \alpha\cdot x_i + \alpha(1-\alpha)\cdot x_{i-1} + \alpha(1-\alpha)^2 \cdot x_{i-2} +...  = \sum_{j=0}^{i} \alpha(1-\alpha)^{i-j} \cdot x_j = \sum_{j=0}^{i} w_{ij} \cdot x_j$$
In the i-th iteration, the output is a weighted sum of each previous input value $x_j$ with $j \in \{0,i\}$, where the scaling corresponds to an exponentially weighted coefficient $w_{ij}= \alpha \cdot (1- \alpha)^{i-j}$.

The impulse response h(t) of its Linear Time Invariant (LTI) system equivalent has an infinite duration, which implies that the transfer function H(z) will have finite duration. If the symbol $*$ represents the convolution operand and $u(n)$ corresponds to the step function, it can be stated:
$$y(n)=x(n)*h(n) \; \; \; with \; \; h(n)=a^n \cdot u(n)=\alpha(1-\alpha)^n \cdot u(n)$$
The EWMA algorithm corresponds to the simplest Infinite Impulse Response (IIR) discrete-time filter. The main advantage that IIR systems have over FIR ones is their implementation efficiency. On the other hand, IIR systems are harder to analyze. In order to simplify the analysis, it is imposed that the system has zero initial conditions. Thus, the 2nd order IIR filter corresponds to:
$$y_i = \frac{1}{a_0} \cdot (a_1 \cdot y_{i-1} + b_0 \cdot x_i + b_1 \cdot x_{i-1} ) \rightarrow  H(z)=\frac{Y(z)}{X(z)} = \frac{b_0 \cdot z^{-0} + b_1 \cdot z^{-1}}{a_0 \cdot z^{-0} + a_1 \cdot z^{-1}} = \frac{b_0 + b_1 \cdot z^{-1}}{a_0+a_1 \cdot z^{-1}}$$
In the Fig 1., it is presented the simplified and complete Direct Form 1 (DF1) of the filter. In the case of the EWMA, the coefficients have fixed values in terms of the smoothing factor that correspond to: $a_0=1, \; \; a_1=1-\alpha, \; \; b_0=\alpha, \; \; b_1=0$. Applying those constraints, the transfer function becomes:
$$\boxed{H(z)=\frac{\alpha}{1-(1-\alpha) \cdot z^{-1}} }$$
Fig 1. Block diagrams of the EWMA (left) and full IIR of first order (right).
The inclusion of the previous outputs in the computation of the current output, $a_j\neq0$, creates a feedback that makes the filter an IIR. Besides, the feedback coefficients have not negative values, $a_j\geq0$; hence, the systems could be unstable. In order to study the stability of the system, the Bounded-Input Bounded-Output (BIBO) criterion is applied, which is satisfied when the Region of Convergence (ROC) includes the unit circle. The EWMA is a casual system; thus, the ROC must include infinity as well. In order that the ROC includes infinity and the unit circle without the poles, all poles must be located within the unit circle of the z-plane, i.e. all poles of the transfer function must have an absolute value smaller than one, which is satisfied when the smoothing factor belongs to the range [0,1].
$$H(z) = \frac{\alpha}{1-(1-\alpha)\cdot z^{-1}} \rightarrow \infty \; \Leftrightarrow \; 1-(1-\alpha) \cdot z^{-1} = 0 \; \Leftrightarrow \; z=1-\alpha $$
$$|z_p |<1 ⇔  |\alpha-1|<1 ⇔  0<\alpha<2$$ 
Fig 2. Representation of the of the EWMA algorithm for a smoothing factor of 0.3. The left sub-plot depicts the Z-plane with poles (x) and zeros (o). The right sub-plot presents the Magnitude and Phase response for a normalized frequency.

In the Fig 3., the left sub-plot presents the weighting coefficients for the EWMA of length 20 for different values of $\alpha$, namely 0.05 (blue), 0.1 (green) and 0.5 (red). A large smoothing factor gives larger weight to the last inputs, i.e. high reactiveness. On the other hand, a small smoothing factor weights more equally all the history of inputs, i.e. high stability. Besides, the memory on this case could be too long, which implies that the situation that occurred long time ago is still affecting the system (high persistency), and is slow in reacting to condition changes. Therefore, the choice of the optimum smoothing factor is a delicate task that must balance the reactiveness and stability of the system. In Fig 3., the right sub-plot presents the EWMA output for a step function, i.e. the output of the system when the input rises from 0 to 1. In this case, the reactiveness effect is more clearly presented. In fact, for a smoothing factor of 0.05, it is needed almost 100 samples/iterations to converge. 

Fig 3. Analysis of EWMA algorithm for different smoothing factors. On the left sub-plot, it is presented the weighting coefficient for a length history of 20 samples, where the later inputs are represented with larger indices. On the right sub-plot, it is shown the response to the step function.
The effect of each input in the final output decay exponentially until the system resolution hides the contribution of very old input values. This idea corresponds to the discrete-time equivalent of the first order exponential decay, which is represented by the time constant $\tau$ that corresponds to the needed time for a 63.21% decay (2dB) in amplitud. The relation between the smoothing factor and the time constant can be expressed with the relation: $\alpha=1-exp(-T/\tau) \rightarrow \tau/T=-1/Ln(1-\alpha)$. After 4-5 time constants, the contribution of the past input to the current output is 98.17%-99.33%; hence, it can be neglected. For example, for $\alpha=0.05$, the constant time corresponds to $\tau=-1/ln(1-0.05)\simeq 20$.

The EWMA can be also studied as a predictor-corrector model, where the current output corresponds to the previous output, $y_i = y_{i-1}$, plus a correction term $\Delta y_i$ that accounts for the behavior that is not modeled in the system, i.e. it is the surprise or generalization error. When $\alpha \rightarrow 0 \Rightarrow \Delta y_i \rightarrow 0 \Rightarrow y_i = y_{i-1}$, there is not suprise/error. This approach can be considered as a simple special case of the Kalman filter.
$$y_i = y_{i-1} + \Delta y_i = y_{i-1} + \alpha \cdot (x_i - y_{i-1} )$$



Apart from the sampling interval, this system has only one tuning parameter, and it requires only one memory register. Hence, it is a low-pass filter that is easy to implement and requires few computation resources. Thus, it would be used for smoothing data where low complexity is needed, e.g. real-time applications.

Deterministic cash-flow

The cash flow refers to the variation of money through the time. This variation may be due to the amount of money (price) or its appreciation (value). For example, if we invest 1€ and we get 2€ after 100 years, our amount of money has doubled :) However, due to the inflation, its value is likely to have decreased :( In other words, now, we can buy one coffee for 1€; but, in 100 years, we will not be able to buy the same coffee for 2€.

The deterministic approach comes from the assumption that we know the value of the return at the end of the period. This assumptions only holds when we talk about loan conditions for the future or we talk about the past. For example, if we put our 1€ in the bank with the condition that we will get 2€ at the end of the year without any risk, the return is deterministic. Besides, if we analyze the wage increase during the Hungarian hyperinflation of the 40's, the historical values are recorded; thus, the cash-flow is also deterministic. On the other hand, in case the return depends on any stochastic variable, we talk about probabilistic cash-flow.



The initial price of the asset is $P_0$ and the price after n-periods is $P_n$, where the n-index usually means one-year. Thus, it is possible to define the magnitude and ratio of price change during the n-th period as:
 - Price difference: $DP_n=P_n-P_{n-1}$
 - Price gain: $GP_n=P_n / P_{n-1}$

Interest

The return/interest $R_n$ is the price difference $DP_n$ normalized respect to the initial price.
$$R_n=\dfrac{P_n-P_{n-1}}{P_{n-1}}=\dfrac{DP_n}{P_{n-1}}=\dfrac{P_n}{P_{n-1}}-1=GP_{n}-1$$
If the earnings are not reinvested, the interest is simple:
 - Price after one period: $P_1=P_0+P_0 \cdot R_1$
 - Price after n-periods: $P_n=P_{0} \cdot (1+\sum_{i=1}^{n} R_i)$
 - Price after n-periods with constant interest: $P_n=P_{0} \cdot (1+n \cdot R_i) \; \; \; if \: R_i=R \; \; \; \forall \: i \in \{ 1,n \}$
If the earnings are reinvested to generate more earnings, the interest is compounded: 
 - Price after one period: $P_1=P_0+DP_1=P_0+P_0 \cdot R_1=P_0 \cdot (1+R_1) =P_0 \cdot GP_1$
 - Price after n-periods: $P_n=P_{n-1} \cdot (1+R_n)=P_0 \cdot \prod_{i=1}^{n} (1+R_i)$
 - Price after n-periods with constant interest: $P_n =P_0  \cdot (1+R)^n \; \; \; if  \: R_i=R \; \; \; \forall \: i \in \{ 1,n \}$
 - Price after n-periods with m-compoundings per period and constant interest: $P_n=P_0 \cdot (1+R/m)^{n \cdot m}=P_0 \cdot (1+R_{eff})^n$

The simple compounding corresponds to a geometric average. In the limit (infinite time), the final price corresponds to $\lim_{n \to \infty}P_n=P_0 \cdot e^{R \cdot n}$. The simple interest gives a linear growth, but the compounding gives an exponential growth because the initial capital for each period is growing.

Fig 1. Effect of the compounding for an interest of 0.01 and 0.02.


Continuous Interest

The continuous return ($r_n$) is the natural logarithm (bel) of the price gain in the n-th period ($GP_n$).
$$r_n=Ln(1+R_n)=Ln(GP_n)=Ln(P_n / P_{n-1})=Ln(P_n)-Ln(P_{n-1})=p_n-p_{n-1}$$
 - Price after one period: $p_1=p_0+r_1 \rightarrow P_1=P_0 \cdot exp(r_1)$
 - Price after n-periods:   $p_n=p_{n-1}+r_n =p_0+\sum_{i=1}^{n}r_i \rightarrow P_0=P_1 \cdot \prod_{i=1}^{n}exp(r_i) $
 - Price after n-periods with constant interest: $p_n=p_0+n \cdot r \; \; \; if \:  r_i=r \; \; \; \forall \: i \in \{ 1,n \}$

The continuous compounding corresponds to an arithmetic average, which allows to express a cash-flow as a linear expression that greatly facilitates the operations, c.f. geometric average of non-continuous interest. The simple return is always greater or equal than the continuous return, i.e. $R_n \geq r_n$. The equality is satisfied for small returns where its Taylor series decomposition is approximated by the first term, i.e. if $R_n < 1-2\%$, then $r_n = Ln(1+R_n) \simeq R_n$

Portfolio

The portfolio is a collection of m assets/shares. The proportion of the invested capital in the j-th asset corresponds to $x_j$. Thus, if the total capital is $P$, the capital invested in the j-th asset corresponds to $P_j=x_j \cdot P$, where it is satisfied that $\sum_{j=1}^{m} x_j=1$ and $\sum_{j=1}^{m}P_j=P$. The aggregated portfolio return ($R_{pf}$,$r_{pf}$) is the weighted average of the single returns.
$$R_{pf}=\sum_{j=1}^{m}x_j \cdot R_j \; , \; r_{pf}=Ln(1+\sum_{j=1}^{m}x_j \cdot R_j)$$ 

Inflation

The inflation ($\Pi_n$,$\pi_n$) is the difference of the Consumer Price Index (CPI) between the beginning and end of the studied period. It represents the loss of opportunity due to money devaluation.
$$\Pi_n=\frac{CPI_n-CPI_{n-1}}{CPI_{n-1}} \; , \; \pi_n=Ln \left ( \frac{CPI_n}{CPI_{n-1}} \right )$$
Thus, the real return is: $R_{n}^{real}=\frac{(1+R_n)}{(1+\Pi_n)} \; , \; r_{n}^{real}=r_n-\pi_n$