class: center, middle, inverse, title-slide .title[ # Econometría (II / Práctica) ] .subtitle[ ## Magíster en EconomíaTema 7: Breve Introducción a Series de Tiempo (TS) ] .author[ ### Prof. Luis Chancí ] .date[ ###
www.luischanci.com
] --- layout:true <div style="position:absolute; left:60px; bottom:11px; font-size: 10pt; color:#DDDDDD;">Prof. Luis Chancí - Econometría (II / Práctica)</div> --- # Introduction Previously: we were interested in `\(\mathbb{E}(y_i|x)=f(x_i)\)` using cross-sectional data. In this section: we will (briefly) introduce methods for time series analysis. That is, a set of observations `\(y_1,...,y_t,...,y_T\)` with `\(t\)` as the time index. We will focus on discrete time series, with a natural temporal ordering, `\(1 < ... < t < ... < T\)`), where `\(y_{t−1}\)` is realized when `\(y_t\)` is determined. Most data in macroeconomics and finance come in this form. A researcher's interest typically lies in modeling, forecasting, and studying the effects of shocks (and whether these effects will dissipate). <img src="econometria_7_files/figure-html/ts.sim-1.png" width="32%" height="32%" style="display: block; margin: auto;" /> --- # Time Series as a Stochastic Process A time series is a stochastic process (the sequence of random variables `\(\{Y_t\}\)` ), where observations close in time will be dependent. Their study requires a different distributional theory than the one we used in cross-sectional. - While a deterministic process will always produce the same output from a given starting condition, a stochastic process has some indeterminacy that relates to the future evolution of the process. - **Stochastic Process:** the probability law governing `\(\{Y_t\}\)`. - **Realization:** One draw from the process (would be `\(\{y_t\}\)` ). For instance, 'if we could re-run history', one result would be `\(\sum_r{y_r}/M\)` and another `\(\sum_t{y_t}/T\)`, `$$\begin{array}{lccccc} {\color{#6A5ACD}{\text{Stochastic process}}} & {\color{#6A5ACD}{Y_1}}, & \ldots & {\color{#6A5ACD}{Y_t}}, & \ldots & {\color{#6A5ACD}{Y_T}} \\ \hline \text{Realization 1:} & y_1^{(1)}, & \ldots & y_t^{(1)}, & \ldots & y_T^{(1)} \\ & \vdots & & \vdots & & \vdots \\ \text{Realization r:} & y_1^{(r)}, & \ldots & y_t^{(r)}, & \ldots & y_T^{(r)} \\ & \vdots & & \vdots & & \vdots \\ \text{Realization R:} & y_1^{(R)}, & \ldots & y_t^{(R)}, & \ldots & y_T^{(R)} \\ \end{array}$$` --- # Introduction (cont.) We will review: 1. Univariate Time Series (single, scalar, observations recorded sequentially over equal time increments). For instance, `\(Y_t=0.7Y_{t-1}+u_t\)`.<br> 2. Non-Stationary Time Series (unit root). For instance, `\(Y_t=Y_{t-1}+u_t\)`.<br> ( + a note on models for the Variance of a Time Series ARCH/GARCH, for instance, `\(\sigma^2_t=0.2Y_{t-1}^2\)`).<br> 3. Vector Autoregressive models (VAR). <br> For instance, `\(Y_t=0.7Y_{t-1}+0.2X_{t-1}+u_t\)` and `\(X_t=0.3X_{t-1}+0.1Y_{t-1}+\nu_t\)`. --- # Concepts Before we start, some concepts: - Conditional first moment: `\(\mathbb{E}(Y_t|Y_{t-1})\equiv f(y_{t-1})\)`. - **Autocovariances:** `\(\gamma_{t,k}=cov(Y_t,Y_{t+k})\)`; `\(\gamma_{0}=Var(Y_t)=\mathbb{E}(Y_t-\mathbb{E}(Y_t))^2\)`. - **Autocorrelations:** `\(\rho_{t,k}=cor(Y_t,Y_{t+k})\)`. - **Strict Stationarity (strong):** The process is strictly stationary if the probability distribution of `\((Y_t,Y_{t+1},...,Y_{t+k})\)` is identical to the probability distribution of `\((Y_\tau,Y_{\tau+1},...,Y_{\tau+k})\)` `\(\forall\)` `\(t,\tau,k\)` (joint distributions are time invariant). - **Covariance Stationarity (weak):** The process is covariance stationary if `\(\mathbb{E}(Y_t)=\mu_t=\mu=\text{cons}\)` and `\(\gamma_{t,k}=\gamma_{k}\)` `\(\forall\,\,t,k\)` (mean and autocovariances are time invariant). <br> .hi-bold[The central point will be whether the TS of interest are stationary or not (e.g., whether the series will return to its mean after a shock). This determines the technique to use.] ??? Estacionareidad : cambian los momentos con t - E. débil: media y segundos momentos (var, cov) no cambian con t - Fuerte/estricta: La función de distrib conjunta de la serie no depende de t. - Ejemplo/caso: proceso ruido blanco: No existe manera de predecirlo usando `\(\mathcal{I}_{t}\)` - solo supuesto E(eps)=0 es una mds - En modelamiento diferencia que `\(y_t\)` y E(Y) sean un ruido blanco: ya se empleo toda la inform - Innovación: cuando es ortogonal al set de informacion. Innov es un ruido blanco E(Y|Y_t-1) E(Y|I_t) (innov mas fuerte). --- # Concepts (cont.) - **White noise.** serially uncorrelated random variables with zero mean and finite variance. For example, the Gaussian white noise process, `\(\varepsilon_t\sim\mathcal{N}(0,\sigma^2)\)`, which implies `\(\mathbb{E}(\varepsilon_t)=\mathbb{E}(\varepsilon_t|\varepsilon_{t-1}, \varepsilon_{t-2}...)=0\)`, `\(\mathbb{E}(\varepsilon_t\varepsilon_{t-j})=Cov(\varepsilon_t\varepsilon_{t-j})=0\)`, and `\(\mathbb{E}(\varepsilon_t^2)=Var(\varepsilon_t)=\sigma^2\)` (cons.). A related idea is the concept of **innovation** (whether the information set is involved). <br> - **Martingale:** `\(Y_t\)` follows a martingale process if `\(\mathbb{E}(Y_{t+1}|\mathcal{I}_t)=Y_{t}\)` where `\(\mathcal{I}_t\)` is the `\(t\)` information set. <br> - **Martingale Difference Process:** `\(Y_t\)` follows a martingale difference process if `\(\mathbb{E}(Y_{t+1}|\mathcal{I}_t)=0\)`. `\(\{Y_t\}\)` is called a martingale difference sequence ('MDS'). A related concept is **Brownian Motion** (a continuous version of an MDS). <br> - The Lag Operator `\(L\)` lags the elements of a sequence by one period: `\(Ly_t=y_{t-1}\)`; `\(L^2y_t=y_{t-2}\)`. --- class: inverse, middle, mline, center # 1. Univariate Time Series --- # The ARMA Process **Autoregressive Process (AR).** The present value of a time series is a linear function of previous observations, `$$Y_t=\sum_j^p\phi_j Y_{t-j}+u_t$$` or `\(a(L)Y_t=u_t\)` where `\(a(L)=(1-\phi_1L-\phi_2L^2-...-\phi_pL^p)\)` and `\(u_t\)` is sometimes called an innovation. <br> For instance, the AR(1) is `\(Y_t=\phi Y_{t-1}+u_t\)`. **Moving Average process (MA).** The (weighted) sum of the current and previous errors, `$$Y_t=\sum_j^q\theta_j u_{t-j}+u_t$$` or `\(Y_t=b(L)u_t\)` where `\(b(L)=(1+\theta_1L+\theta_2L^2+...+\theta_pL^p)\)`. For instance, the MA(1) is `\(Y_t=\theta_1 u_{t-1}+u_t\)`. <br> The **ARMA(p,q)** is `\(a(L)Y_t=b(L)u_t\)`. For instance, ARMA(1,1): `\((1-\phi L)Y_t=(1+\theta L)u_t\)`. This model was popularized by Box and Jenkins, who also developed a methodology that I will mention later. --- # AR to MA Let's explore an interesting link between the AR and MA models, which will be useful later in our discussion of stationarity. We'll start with the AR(1) model and then progress to the AR(2) to establish a more general case. Notice that the AR(1), after repeated substitutions, can take the following form `$$\begin{eqnarray} Y_t&=&\phi Y_{t-1}+u_t\\ &=&\phi (\phi Y_{t-2}+u_{t-1})+u_t\\ &=&\phi^{r} Y_{t-r}+\phi^{r-1} u_{t-r+1}+ ...+\phi u_{t-1}+u_t \end{eqnarray}$$` therefore, if `\(|\phi|<1\)`, - `\(\lim_{r\rightarrow\infty}\phi^{r} Y_{t-r}=0\)` - and, hence, `$$\left. Y_t=\sum_{j=0}^\infty\phi^j\,u_{t-j} \,\,\,\,\right\}\text{ MA}(\infty)$$` --- # AR to MA (cont.) Alternatively, using the lag operator, `\((1-\phi L)Y_t=u_t\)`, the question would be like `$$Y_t\stackrel{?}{=}(1-\phi L)^{-1}u_t$$` <br> Thus, for `\(|\phi|<1\)`, and honoring Brook Taylor, .pull-left[ `$$\begin{eqnarray} Y_t &=& (1-\phi L)^{-1}u_t\\ &=&(1+\phi L+\phi^2 L^2 +...)u_t\\ &=& \sum_{j=0}^\infty\phi^j\,u_{t-j} \hspace{2.7cm},\hspace{.2cm}\text{ MA}(\infty) \end{eqnarray}$$` ] .pull-right[ .center[] .center[Brook Taylor (1685-1731)] ] --- # AR to MA (cont.) For the AR(2), `\((1-\phi_1 L - \phi_2 L^2)Y_t=u_t\)`, the term `\((1-\phi_1 L - \phi_2 L^2)\)` looks like a second order polynomial; that is, because `\(L\)` is an operator, replacing `\(L\)` by `\(\psi\)`, we have `\((1-\phi_1 \psi - \phi_2 \psi^2)\)`. Factoring second degree polynomials, `$$\begin{eqnarray} &(1-\phi_1 \psi - \phi_2 \psi^2)\equiv(1-\lambda_1 L)(1-\lambda_2 L)\\ &\lambda_1*\lambda_2=-\phi_2\\ &\lambda_1+\lambda_2=\phi_1 \end{eqnarray}$$` thus, `$$Y_t=(1-\lambda_1 L)^{-1}(1-\lambda_2 L)^{-1}u_t$$` and, therefore, for `\(|\lambda_i|<1,\)` `$$\left. Y_t=\left( \sum_{j=0}^\infty\lambda_1^j L^j \right)\left( \sum_{j=0}^\infty\lambda_2^j L^j \right)\,u_{t-j} \,\,\,\,\right\}\text{ MA}(\infty)$$` <br> In other words, instead of imposing conditions on `\(\phi\)`, the requirements are placed on `\(\lambda\)`. Generally speaking, **having eigenvalues with a modulus less than one**, which is equivalent, as we will state later, to all **the roots of the characteristic polynomial having a modulus greater than one.** --- # AR to MA (cont.) .pull-left[ .hi-bold[Example 1:] `$$Y_t=0.6Y_{t-1}+0.2Y_{t-2}+u_t$$` we have `$$(1-0.6L-0.2L^2) Y_t=u_t$$` Thus, to find the eigenvalues, `$$(\lambda^2-0.6\lambda-0.2)=0$$` which implies `$$\lambda_i=\frac{-(-0.6) \pm \sqrt{(-0.6)^2-4(-0.2)}}{2}$$` hence, `\(\lambda_1=0.84\)` and `\(\lambda_2=-0.24\)`. ] .pull-right[ .hi-bold[Example 2:] `$$Y_t=0.5Y_{t-1}-0.8Y_{t-2}+u_t$$` we have `$$(1-0.5L+0.8L^2) Y_t=u_t$$` Thus, `$$(\lambda^2-0.5\lambda+0.8)=0$$` which implies `$$\lambda_i=0.25\pm0.86 i$$` hence, `$$R=\sqrt{0.25^2+0.86^2}=0.9<1$$` ] --- # Moments and Stationarity - MA Let's begin with the MA(1) `$$Y_t=\theta u_{t-1}+u_t \hspace{0.6cm},\hspace{0.6cm}u_t\sim(0,\sigma^2)$$` one can show that: - `\(\mathbb{E}(Y_t)=\theta \mathbb{E}(u_{t-1})+\mathbb{E}(u_t)=0\)` - `\(\gamma_0=V(Y_t)=V(\theta u_{t-1})+V(u_t)+2Cov(\theta u_{t-1},u_t)=(\theta^2+1)\sigma_2\)`, which is a constant (does not depend on time). - `\(\gamma_1=\mathbb{E}(Y_tY_{t-1})=\theta\sigma^2\)`, which is also a constant (does not depend on time). - `\(\gamma_s=0\)` for `\(s>1\)` - `\(\rho_1=\frac{\gamma_1}{\gamma_0}=\frac{\theta}{(1+\theta^2)}\)` - `\(\rho_s=0\)` for `\(s>1\)` Hence, **the MA(1) process is said to be covariance (weakly) stationary.** --- # Moments and Stationarity - MA (cont.) Now, for the MA(q) process: `$$Y_t=\sum_{j=0}^q{\theta_jL^j}u_{t-j} \hspace{0.6cm},\hspace{0.6cm}u_t\sim(0,\sigma^2)$$` one can show that - `\(\mathbb{E}(Y_t)=0\)`, - `\(\gamma_0=(1+\theta_1^2+\theta_2^2+...+\theta_q^2)\sigma_2\)`, - and `$$\gamma_s=\left\{ \begin{array}{lcl} \sigma^2(\theta_j+\theta_{j+1}\theta_1+...+\theta_{q-j}\theta_q) & if & s=1,...,q\\ 0 &if & s>q \end{array}\right.$$` <br> In other words, **the MA(q) process is (weakly) stationary:** `\(Y_t\)` is a combination of stationary terms, where the mean and variance are constant, and the autocovariances depend on `\(s\)` but not on `\(t\)`. --- # Moments and Stationarity - MA (cont.) Lastly, let's review the MA( `\(\infty\)` ), `$$Y_t=\sum_{j=0}^\infty{\theta_jL^j}u_{t-j} \hspace{0.6cm},\hspace{0.6cm}u_t\sim(0,\sigma^2)$$` In this case, - `\(\mathbb{E}(Y_t)=0\)`, - `\(\gamma_0=\sigma_2(1+\theta_1^2+\theta_2^2+...)\equiv\sigma_2\sum_j^\infty\theta^2_j\)`, - and `\(\gamma_s=\sigma^2\sum_j^\infty\theta_j\theta_{j+s}\)` <br> Thus, **the process is covariance (weakly) stationary ** under the following assumption: **Square Summability**, `\(\sum_j\theta_j^2<\infty\)`. An alternative (stronger) requirement would be: **Absolute Summability**, `\(\sum_j|\theta_j|<\infty\)`. --- # Simulating MA models Simulations of MA(q) processes, with `\(u_t\sim(0,0.8^2)\)`. .pull-left[ .center[ MA(1): `\(Y_t=1.2u_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/MA.sim1-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] .pull-right[ .center[ MA(2): `\(Y_t=1.2u_{t-1}+0.9u_{t-2}+u_t\)` ] <img src="econometria_7_files/figure-html/MA.sim2-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] --- # Moments and Stationarity - The AR Model We just reviewed conditions for a MA process to be covariance stationary. Thus, the focus **for the AR model** will be on whether one can **write the infinite moving average representation**. Let's start with the **AR(1) model**, `\(Y_t=\phi Y_{t-1}+u_t\)`. - As showed, the process can be expressed as `\(Y_t=\sum_{j=0}^\infty\phi^j u_{t-j}\)` and, therefore, the **stationarity condition for the AR(1) is** `\(|\phi|<1\)`, so that the MA sum converges, `\(1+\phi+\phi^2+...\rightarrow\frac{1}{1-\phi}\)`. - Similarly, for `\((1-\phi L)Y_t=u_t\)` the characteristic equation is `\((1-\phi \psi)=0\)`. Thus, its one characteristic root is `\(\psi=1/\phi\)`. Therefore, the series is stationary as long as `\(|\phi|<1\)` which is the same condition as `\(|\psi|>1\)`. Thus, `$$Y_t=(1-\phi L)^{-1}u_t$$` --- # Moments and Stationarity - The AR Model (cont.) Thus, for the **AR(1) model** `\(Y_t=\phi Y_{t-1}+u_t\)` with `\(|\phi|<1\)`, we can find that the first moments are as follows: - the unconditional expectation is `\(\mathbb{E}(Y_t)=0\)`, - the unconditional variance is `\(\gamma_0=V(u_t+\phi u_{t-1}+\phi^2 u_{t-2}+...)=(1+\phi^2+\phi^4+...)\sigma^2\rightarrow\frac{\sigma^2}{1-\phi^2}\)`, - the autocovariances are - `\(\gamma_1=cov(Y_t,Y_{t-1})=(\phi\sigma^2+\phi^3\sigma^2+\phi^5\sigma^2+...)=\phi\gamma_0\)` - `\(\gamma_j=\phi^j\gamma_0\)` --- # Moments and Stationarity - The AR Model (cont.) For the AR(p) model, `$$(1-\phi_1L-\phi_2L^2-....-\phi_pL^p)Y_t=u_t$$` recall that `\(a(L)=(1-\phi_1L-\phi_2L^2-....-\phi_pL^p)\)` is a polynomial in `\(L\)`. Define the characteristic equation as `\(\phi(\psi)=(1-\phi_1\psi-\phi_2\psi^2-....-\phi_p\psi^p)=0\)`. The `\(p\)` solutions, `\(\lambda_1,...,\lambda_p\)`, can be used to factorize the polynomial, `$$\phi(\psi)=(1-\lambda_1\psi)(1-\lambda_2\psi)...$$` The relationship between eigenvalues or **inverse roots** and the **roots** is `\(\psi_j=\lambda_j^{-1}\)`. Therefore, `\(\phi(z)\)` is invertible if each factor is invertible; that is, if `\(|\psi_j|>1\)` (outside the unit circle) or `\(|\lambda_j|>1\)` (inside the unit circle). Notice that this condition considers that some roots may be complex, `\(\lambda_j=r_j\pm c_j\sqrt{-1}\)`. Hence, the condition can be stated as that **all the roots of the characteristic polynomial** ( `\(\psi_j\)` ) **having a modulus greater than one (outside the unit circle).** In such case, `$$Y_t=(1-\lambda_1\psi)^{-1}(1-\lambda_2\psi)^{-1}...(1-\lambda_p\psi)^{-1}u_t$$` --- # Simulating AR models Simulations of AR(p) processes, with `\(u_t\sim(0,0.8^2)\)`. .pull-left[ .center[ AR(1): `\(Y_t=0.8Y_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/AR.sim1-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> .center[ Inverse root: `\(\lambda=0.8\)`. ] ] .pull-right[ .center[ AR(2): `\(Y_t=0.6Y_{t-1}+0.2Y_{t-2}+u_t\)` ] <img src="econometria_7_files/figure-html/AR.sim2-1.png" width="80%" height="80%" style="display: block; margin: auto;" /> .center[ Inverse roots: `\(\lambda_1=-0.24\)` and `\(\lambda_2=0.84\)`. ] ] --- # Approaching Time Dependence ## Autocorrelation function (ACF) Researchers (empirically) study the time dependence by the correlation `$$Corr (Y_t , Y_{t−h}) =\rho_{Y_t,Y_{t-h}}= \frac{Cov (Y_t , Y_{t−h})}{\sqrt{V(Y_t)\cdot V (Y_{t−h})}} \hspace{0.5cm};\hspace{0.3cm}h = ..., −2, −1, 0, 1, 2, ...$$` As a function of `\(h\)`, `\(\rho(h)\)`, it is called a .hi-bold[correlogram]. Under stationarity, `\(V(Y_t)=V (Y_{t−h})\)`, the function is known as .hi-bold[the autocorrelation function (ACF)], thus `\(\rho_h=\gamma_h/\gamma_0\)`. The population moments are replaced with sample moments, `$$\widehat{Cov}(y_t,y_{t-h}) = \frac{1}{T-h}\sum_{t=h+1}^T{(y_t-\bar{y})(y_{t-h}-\bar{y})} \hspace{0.5cm};\hspace{0.3cm}h = ..., −2, −1, 0, 1, 2, ...$$` --- # Approaching Time Dependence (cont.) .hi-bold[The autocorrelation function (ACF)] for the simulated MA and AR processes, with `\(u_t\sim(0,0.8^2)\)`. .pull-left[ .center[ MA(1): `\(Y_t=1.2u_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/MA.ACF-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] .pull-right[ .center[ AR(1): `\(Y_t=0.8Y_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/AR.ACF-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] --- # Approaching Time Dependence in Time Series Analysis ## Understanding Partial Autocorrelation Function (PACF) The Partial Autocorrelation Function (PACF) is an empirical tool for identifying the number of lags in Autoregressive (AR) components of ARMA processes in time series. It measures the correlation between observations in a time series separated by a certain number of lags (k), while controlling for the correlations at all shorter lags. **Lag-by-Lag Analysis**: PACF examines the correlation of a time series with its lagged values for various lags, one at a time, removing the effects of previous lags. **Regression Approach**: Each lag in PACF corresponds to a regression of the time series on its past values up to that lag: `$$\begin{array}{ccc} \text{(Lag) }\,k & \text{Regression Equation} & \text{PACF} \\ \hline 1 & y_t = \beta_0 + \beta_1 y_{t-1} + u_t & \beta_1 \\ 2 & y_t = \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + u_t & \beta_2 \\ \vdots & \vdots & \vdots \end{array}$$` The coefficient ( `\(\beta_k\)` ) of the highest lagged term in each regression represents the partial correlation at that lag. A significant spike in the PACF plot at lag `\(k\)` followed by non-significant values suggests an AR(k) model. --- # Approaching Time Dependence (cont.) .hi-bold[The Partial Autocorrelation Function (PACF)] for the simulated MA and AR processes, with `\(u_t\sim(0,0.8^2)\)`. .pull-left[ .center[ MA(1): `\(Y_t=1.2u_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/MA.PACF-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] .pull-right[ .center[ AR(1): `\(Y_t=0.8Y_{t-1}+u_t\)` ] <img src="econometria_7_files/figure-html/AR.PACF-1.png" width="75%" height="75%" style="display: block; margin: auto;" /> ] --- # The Box-Jenkins Approach It is a systematic (empirical) methodology for analyzing and forecasting time series data. Primarily used for forecasting time series. .hi-bold[Steps:] - **Model Identification.** Analyze the time series plot. Also, the ACF and PACF plots. Identify the appropriate ARIMA (Autoregressive Integrated Moving Average) model (determine the order of differencing, the number of AR terms, and the number of MA terms). In the followig subsection we review the meaning of 'Integrated'. - **Estimation.** Estimate the parameters of the identified ARIMA(p, d, q) model. - **Model Checking.** Validate the fitted model (i.e., checking for autocorrelation in the residuals or the Ljung-Box test to explore no significant autocorrelation in residuals). - **Model Refinement.** If the model does not fit well, return to step 1 for re-identification. --- class: inverse, middle, mline, center # 2. Non-Stationary Time Series --- # Non-Stationary Time Series The main assumption on the time series data so far has been stationarity. However, many macro-economic variables are trending. <img src="econometria_7_files/figure-html/ts.trend-1.png" width="35%" height="35%" style="display: block; margin: auto;" /> Stationarity can be violated in different ways: .pull-left[ - **Deterministic trends** - or trend stationarity. - Unit roots - or **stochastic trends** ] .pull-right[ - Level shifts - breaks - Variance changes. ] --- # Non-Stationary Time Series ## Deterministic Trend Some examples are: (i) `\(Y_t=\psi+\beta \cdot t+\varepsilon_t\)`; or (ii) `\(X_t=\phi X_{t-1}+u_t\)`, where `\(|\phi|<1\)` and `\(Y_t=X_t+\psi+\beta\cdot t\)` <center><figure> <img alt="pics7/pib.png" src="pics7/pib.png" width="40%" height="40%"style="margin: 15px 0 0 0"> <figcaption>Figure: Quarterly GDP - Chile. 'Volumen a precios del año anterior encadenado, series empalmadas, desestacionalizado, referencia 2018.' Source: Construcción del autor usando R y la API del BCentral. </figcaption> </figure></center> </br> --- # Non-Stationary Time Series ## Stochastic Trend: Random Walk A radom walk process is: `\(y_t = y_{t-1} + \varepsilon_t\)` This structure would imply that the effect of a shock is permanent: `\(y_t = \sum_{\tau=1}^{t} \varepsilon_\tau\)` .pull-left[ ``` r # R code to simulate the Random Walk set.seed(1234) t <- 1:100 y <- arima.sim(list(order = c(0, 1, 0)), n = length(t)) plot(y) ``` <img src="econometria_7_files/figure-html/ts.rw-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] .pull-right[ <br><br> The effect of a shock ( `\(y_0=0\)`, `\(\varepsilon_1=1\)`, and `\(\varepsilon_2=\ldots=\varepsilon_T=0\)`) <br> <img src="econometria_7_files/figure-html/ts.rw2-1.png" width="62%" height="62%" style="display: block; margin: auto;" /> ] --- # Non-Stationary Time Series ## Stochastic Trend: Random Walk with Drift It is a random walk plus a constant term: `\(y_t = \mu + y_{t-1} + \varepsilon_t\)` This structure would imply that shocks have permanent effects and are influenced by the drift: `\(y_t = \mu\cdot t +\sum_{\tau=1}^{t} \varepsilon_\tau\)` .pull-left[ ``` r # R code to simulate the Random Walk with drift set.seed(1234) y <- arima.sim(model= list(order = c(0, 1, 0)), n=100, mean=1.3 ) plot(y) ``` <img src="econometria_7_files/figure-html/ts.rwd-1.png" width="60%" height="60%" style="display: block; margin: auto;" /> ] .pull-right[ <br><br> The effect of a shock ( `\(y_0=0\)`, `\(\mu=1.3\)`, `\(\varepsilon_1=1\)`, and `\(\varepsilon_2=\ldots=\varepsilon_T=0\)`) <img src="econometria_7_files/figure-html/ts.rwd2-1.png" width="63%" height="63%" style="display: block; margin: auto;" /> ] --- # Non-Stationary Time Series ## Deterministic vs. Stochastic trend and De-trending **Stochastic Trend.** There are important implications for `\(\phi=1\)` rather than `\(\phi<1\)` in `\(Y_t=\phi Y_{t-1}+\varepsilon_t\)`: - The effect of the initial value stays in the process. `\(\mathbb{E}(Y_t | Y_0) = Y_0\)`. - Shocks have permanent effects. Accumulate to a random walk component `\(\sum\varepsilon_t\)` called a stochastic trend. - The variance increases `\(V(\sum\varepsilon_t|Y_0)=t\cdot\sigma^2\)`. - The covariance is `\(\mathbb{E}((Y_{t}-Y_0)(Y_{t-s}-Y_0)|Y_0)=(t-s)\sigma^2\)` and the autocorrelation is `\(\sqrt{(t-s)/t}\)` (which dies out very slowly with `\(s\)`) **De-trending.** When dealing with non-stationary time series, a common approach is to transform the series to achieve stationarity. This transformation is often referred to as 'de-trending'. **First Order Integration (I(1)).** If the first difference, `\(\Delta Y_t = (Y_t - Y_{t-1}) = \varepsilon_t\)`, is stationary, the series is called integrated of first order, denoted as I(1); hence, it is named **Integrated Series** with `\(d = 1\)`. --- # Non-Stationary Time Series ## A caution in treating trends 1. Using the transformation `\((Y_t - \beta \cdot t)\)` for - **Deterministic Trend:** After the transformation `\((Y_t - \beta \cdot t) = \phi + \varepsilon_t\)`, the series becomes stationary. - **Stochastic Trend:** After the transformation `\((Y_t - \beta \cdot t) = Y_0 + \sum_{j=0}^t \varepsilon_j\)`, the series remains non-stationary. 2. Using First Difference `\((Y_t - Y_{t-1})\)` for - **Deterministic Trend:** The first difference is `\((Y_t - Y_{t-1}) = (\beta \cdot t - \varepsilon_t) - (\beta \cdot (t-1) - \varepsilon_{t-1}) = \beta + (\varepsilon_t - \varepsilon_{t-1})\)`. This is akin to a Moving Average process with a kind of 'unit root.' - **Stochastic Trend:** The first difference is `\((Y_t - Y_{t-1}) = \varepsilon_t\)`. The first difference is stationary. **It is crucial to conduct statistical tests to verify the presence of unit roots and correctly identify the nature of the trend (deterministic or stochastic) in the time series.** --- # Non-Stationary Time Series ## Unit Root Testing The key approach in unit root testing is to test for a unit root in an autoregressive model. To illustrate, let's review the .hi-bold[Dickey-Fuller Test] - for the AR(1) `\(Y_t = \phi Y_{t-1} + \varepsilon_t\)`: - **Hypothesis Testing**. The null Hypothesis `\((H_0)\)` is 'tests against stationarity', specifically `\(H_0: \phi = 1\)`; thus, the alternative hypothesis `\((H_1)\)` implies stationarity. - **Equivalent Formulation**. Reformulate it as `\(\Delta Y_t = \pi Y_{t-1} + \varepsilon_t\)`, where `\(\pi = \phi - 1\)`. Thus, the null hypothesis becomes `\(H_0: \pi = 0\)`, and the alternative is `\(H_1: -2 < \pi < 0\)`. - **Dickey-Fuller Test Statistic**: The test statistic is calculated as the t-ratio: `\(\hat{\pi} / \text{se}(\hat{\pi})\)`. The asymptotic distribution for this test statistic follows the Dickey-Fuller distribution, not the standard normal distribution `\(N(0, 1)\)`. --- # Non-Stationary Time Series ## Unit Root Testing: ADF <br> .hi-bold[Augmented Dickey–Fuller Test in AR(p) Model.] The Augmented Dickey–Fuller (ADF) test extends the Dickey-Fuller test to higher-order autoregressive processes, `\(Y_t = \phi_1 Y_{t-1} + \phi_2 Y_{t-2} + \dots + \phi_p Y_{t-p} + \varepsilon_t\)`. - The inclusion of lagged difference terms accounts for serial correlation and makes the test robust for higher-order AR processes. - **ADF Test Formulation**: Reformulate the model to include lagged difference terms `\(\Delta Y_t = \alpha + \beta t + \gamma Y_{t-1} + \delta_1 \Delta Y_{t-1} + \delta_2 \Delta Y_{t-2} + \dots + \delta_{p-1} \Delta Y_{t-p+1} + \varepsilon_t\)`. - **Null and Alternative Hypotheses**: `\((H_0)\)` is that 'the time series has a unit root (non-stationary)', i.e., `\(\gamma = 0\)`. The alternative Hypothesis is that the time series is stationary. - **ADF Test Statistic**: The test statistic is `\(\text{t-statistic} = \hat{\gamma} / \text{se}(\hat{\gamma})\)`. And the critical values for this test are specific to the ADF distribution. Rejecting `\(H_0\)` suggests that the series is stationary. Implementation Notes: The selection of p is crucial and can be determined based on information criteria like AIC or BIC. --- # Non-Stationary Time Series ## Unit Root Testing: ADF (cont.) There are, however, some weaknesses of the ADF test: - Assumption of i.i.d. residuals. But many time series exhibit, for instance, time-varying volatility or conditional heteroskedasticity, violating this assumption. - Power Issues: The ADF test can suffer from low statistical power, especially when the series is close to being non-stationary but not exactly so (the test may fail to reject the null hypothesis even when the series is actually stationary). - Small Samples: The test may have size distortion. The probability of rejecting `\(H_0\)` when it is true (type I error) can be higher than the nominal level. --- # Non-Stationary Time Series ## Unit Root Testing: Alternative Unit Root Tests - **Phillips-Perron Test** is a variation of the ADF test. Focuses on correcting for any autocorrelation and heteroskedasticity in the error terms utilizing non-parametric statistical methods. - **Zivot-Andrews Test** accounts for the possibility that the time series may appear to have a unit root but is actually stationary around a changing mean (structural break). The null hypothesis of a unit root with a one-time structural break in level or trend. - **Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test** aims to determine if a time series is stationary around a deterministic trend. The null hypothesis is that the series is stationary (trend stationary or level stationary). --- # Example: Time Series in
Apple Inc. stock prices (note: this is only an illustration of commands and it is not aimed at discussing whether is a good empirical model). .pull-left[ ``` r # Código de programación en R: library(quantmod) # acceder a datos library(forecast) # methods and tools for displaying and analysing univariate time series library(tseries) # Time Series Analysis and Computational Finance apple <- getSymbols( "AAPL", from=as.Date("2015-01-01"), to =as.Date("2023-10-12"), auto.assign=F) df <- data.frame(date= index(apple), apple, row.names = NULL) precio.sa<- seasadj(stl( ts( na.omit(ma(df$AAPL.Adjusted, order=7)), frequency=7), s.window="periodic")) tsdisplay(precio.sa) ``` ] .pull-right[ <img src="econometria_7_files/figure-html/fig_TS_out-1.png" width="100%" height="100%" style="display: block; margin: auto;" /> ] --- # Example: Time Series in
(cont.) .pull-left[ ``` r # Unit Root: adf.test(precio.sa) ## ## Augmented Dickey-Fuller Test ## ## data: precio.sa ## Dickey-Fuller = -2.2511, Lag order = 13, p-value = 0.472 ## alternative hypothesis: stationary # First Diff dP <- diff(precio.sa, differences = 1) # Estimar el mejor modelo ARIMA auto.arima(dP, seasonal=FALSE) # Then, ARIMA(1,1,1) ## Series: dP ## ARIMA(1,0,1) with non-zero mean ## ## Coefficients: ## ar1 ma1 mean ## 0.8116 0.1084 0.0688 ## s.e. 0.0147 0.0253 0.0425 ## ## sigma^2 = 0.1157: log likelihood = -749.11 ## AIC=1506.21 AICc=1506.23 BIC=1529 ``` ] .pull-right[ ``` r #ACF PACF tsdisplay(dP) ``` <img src="econometria_7_files/figure-html/fig_TS3-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ``` r # Unit Root adf.test(dP)$p.value ## [1] 0.01 ``` ] --- class: inverse, middle, mline, center # A short note on ARCH Models --- # Autoregressive Conditional Heteroscedasticity (ARCH) Up to now, we modeled `\(Y_t\)` using ARIMA structures, but it is also possible to model the residuals or the variance. ARCH models are used to model and forecast time-varying volatility in time series data. Widely used for modeling and forecasting the volatility of financial assets like stocks, currencies, and derivatives. The ARCH(q) model is defined for a time series `\(Y_t\)` with the following structure for the variance: `$$\sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 + \alpha_2 \varepsilon_{t-2}^2 + \dots + \alpha_q \varepsilon_{t-q}^2$$` where, `\(\sigma_t^2\)` is the conditional variance and `\(\varepsilon_t = Y_t - \mu_t\)` is the residual at time `\(t\)`. The parameters of an ARCH model are typically estimated using Maximum Likelihood Estimation (MLE). Choosing the correct order q is often based on statistical tests like the Lagrange Multiplier (LM) test. --- # Extensions of ARCH Models Some extensions are aimed at offering more flexibility and accuracy in modeling complex volatility patterns observed in real-world financial time series data. - **Generalized ARCH (GARCH)**: An extension of ARCH that includes lagged conditional variances in the model. `$$\sigma_t^2 = \alpha_0 + \sum_{i=1}^{q} \alpha_i \varepsilon_{t-i}^2 + \sum_{j=1}^{p} \beta_j \sigma_{t-j}^2$$` where `\(p\)` and `\(q\)` are the orders of the GARCH model. - **Exponential GARCH (EGARCH)**: Models the log of the variance, capturing the asymmetric effects of positive and negative shocks on volatility. - **Threshold GARCH (TGARCH)**: Allows different responses of volatility to positive and negative shocks, useful in financial markets where volatility tends to increase more with negative shocks. --- class: inverse, middle, mline, center # A Short on Structural Breaks --- # Structural Breaks - **Definition:** A structural break occurs when the underlying data-generating process (DGP) changes at one or more points in time. This implies a change in the parameters of the model that describes the data. - **Importance:** Ignoring structural breaks can lead to: * **Biased parameter estimates:** The estimated coefficients will be a weighted average of the coefficients from different regimes, leading to incorrect inferences. * **Unreliable forecasts:** Predictions based on a model with breaks that are not accounted for will be inaccurate, as the model will not capture the true dynamics of the data after the break. * **Misleading hypothesis testing:** Standard errors will be incorrect, potentially leading to wrong conclusions about the significance of variables. - **Examples:** * **Policy Changes:** Shifts in monetary policy (e.g., change in interest rate targeting regime) or fiscal policy (e.g., tax reforms). * **Economic Crises:** Events like the 2008 financial crisis or the COVID-19 pandemic can drastically alter economic relationships. * **Regulatory Changes:** New regulations can impact market structures and firm behavior. --- # Modeling Structural Breaks **Single Break Model.** Suppose a single break occurs at time `\(T_b\)`. The model can be written as: $$ y_t = \begin{cases} X_t'\beta_1 + \epsilon_t, & \text{if } t \leq T_b, \\ X_t'\beta_2 + \epsilon_t, & \text{if } t > T_b, \end{cases} $$ where, `\(y_t\)` is the dependent variable; `\(X_t\)` is a vector of independent variables; `\(\beta_1\)` and `\(\beta_2\)` are the coefficient vectors in the two regimes; `\(\epsilon_t\)` is the error term; `\(T_b\)` is the unknown break point. **Multiple Breaks Extension.** When there are `\(m\)` multiple breaks at unknown dates, the model becomes: `$$y_t = X_t'\beta_j + \epsilon_t, \quad \text{for } T_{j-1} < t \leq T_{j}, \, j = 1, 2, ..., m+1.$$` where, `\(T_0 = 0\)` and `\(T_{m+1} = T\)` (the total number of observations); `\(T_1, T_2, ..., T_m\)` are the unknown break points; `\(\beta_j\)` is the coefficient vector in the `\(j\)`-th regime. Key Challenges involve determining the number of breaks ($m$), estimating the unknown break points ($T_1, T_2, ..., T_m$), and/or estimating the coefficients ($\beta_j$) in each regime. --- # Detecting Structural Breaks **Visual Inspection:** Plot the time series data and look for sudden shifts in the mean, variance, or overall pattern. This can provide initial clues about potential break points but is subjective. **Statistical Tests:** - **Chow Test (for a *single known* break point):** Tests the null hypothesis that there is no structural break at a *pre-specified* time point. Involves splitting the data into two sub-samples based on the hypothesized break point and comparing the sum of squared residuals from separate regressions on each sub-sample to the sum of squared residuals from a regression on the full sample. - **Quand-Andrews Test (for a *single unknown* break point):** Tests the null hypothesis of no structural break against the alternative of a single break at an *unknown* point. The Quandt Likelihood Ratio (QLR) statistics is the largest Chow statistics over all possible break dates. - **Bai-Perron Test (for *multiple unknown* break points):** A powerful and widely used method for identifying multiple structural breaks at unknown dates. Allows for heteroskedasticity and serial correlation in the error terms. Can test for the presence of breaks and estimate their number and locations. --- # The Bai-Perron Test for Multiple Structural Breaks Considers a linear regression model with `\(m\)` potential breaks ( `\(m+1\)` regimes): `$$y_t = x_t'\beta_j + u_t \quad \text{for} \quad t = T_{j-1}+1, \ldots, T_j, \quad j = 1, 2, ..., m+1$$` where `\(y_t\)` is the dependent variable; `\(x_t\)` is a vector of independent variables; `\(\beta_j\)` is the coefficient vector in regime `\(j\)`; `\(u_t\)` is the error term; `\(T_1, T_2, \ldots, T_m\)` are the unknown break points; `\(T_0 = 0\)` and `\(T_{m+1} = T\)` (the total number of observations). **Estimation:** 1. **Global Minimization of Sum of Squared Residuals (SSR):** - The Bai-Perron method aims to find the break points that minimize the overall sum of squared residuals across all regimes. This is done by considering all possible combinations of break points and selecting the combination that yields the lowest global SSR. 2. **Dynamic Programming Algorithm:** - Due to the computational burden of evaluating all possible break point combinations, Bai and Perron (1998, 2003) developed a dynamic programming algorithm to efficiently solve the optimization problem. --- # The Bai-Perron Test for Multiple Structural Breaks **Statistical Tests:** - **"sup F-type" Test (or Double Maximum Test).** Tests the null hypothesis of no structural breaks ($m=0$) against the alternative of a fixed number of breaks ($m=k$, where `\(k\)` is a constant). The test statistic is the maximum of F statistics computed over all segments with a minimum length. Critical values are nonstandard and obtained through simulations. - **Sequential `\(F\)`-tests (or sup `\(F_t(l+1|l)\)` Test):** Test the null hypothesis of `\(l\)` breaks against the alternative of `\(l+1\)` breaks. The procedure begins by estimating a model with one break and then sequentially adding breaks until the null hypothesis is not rejected. Allows to determine the number of breaks sequentially. - **Information Criteria:** Model selection criteria like the Bayesian Information Criterion (BIC) or the modified Schwarz criterion (LWZ) can be used to choose the optimal number of breaks. These criteria balance the goodness of fit (SSR) with a penalty for model complexity (number of parameters). **Implementation in
:** See, for instance, the `strucchange` package. --- class: inverse, middle, mline, center # 4. Vector Autoregressive models (VAR) --- # Introduction to VAR Stock and Watson (2001, in *Journal of Economic Perspectives*) outline key roles for macroeconomists, highlighting how Vector Autoregressive models (VARs) are instrumental in fulfilling these roles. These tasks include: * Describing and summarizing macroeconomic data. * Making macroeconomic forecasts. * Inferring the causal structure of the macroeconomy from the data (a.k.a. "structural estimation"). * Informing macroeconomic policy decisions. **Vector Autoregression (VAR)** models capture the linear interrelationships among multiple time series. They generalize the univariate autoregressive model to multiple time series, allowing for a more dynamic and interactive analysis. --- # State-Space Representation of a VAR(p) Model Consider a system of two time series variables, `\(y_t\)` and `\(m_t\)` (e.g., GDP and a monetary policy indicator), each modeled as a function of two lags of both variables: `$$\begin{align} y_{t} &= \phi_{1} y_{t-1} + \phi_{2} y_{t-2} + \theta_{1} m_{t-1} + \theta_{2} m_{t-2}+ \epsilon_{1t} \\ m_{t} &= \gamma_{1} m_{t-1} + \gamma_{2} m_{t-2} + \alpha_{1} y_{t-1} + \alpha_{2} y_{t-2} +\epsilon_{2t} \end{align}$$` Where: * `\(y_{t}\)` and `\(m_{t}\)` are the time series variables. * `\(\{\phi, \theta, \gamma, \alpha\}\)` are the coefficients to be estimated. * `\(\epsilon_{1t}\)` and `\(\epsilon_{2t}\)` are white noise error terms. This is a VAR(2) model because each variable is a function of up to 2 lags of itself and the other variable. In general a VAR(p) model will include up to `\(p\)` lags of each variable in each equation. --- # State-Space Notation in Vector Autoregression (VAR) **State-Space Representation:** VAR models can be compactly written using state-space notation, which is particularly useful for multivariate time series analysis and can be combined with techniques like the Kalman filter for forecasting and dynamic analysis. We can express our example VAR(2) model in matrix form: `$$\begin{eqnarray} \left[\begin{array}{c} y_t\\m_t \end{array}\right] &=& \left[\begin{array}{cc} \phi_1&\theta_1\\ \alpha_1&\gamma_1 \end{array}\right] \left[\begin{array}{c} y_{t-1}\\m_{t-1} \end{array}\right] + \left[\begin{array}{cc} \phi_2&\theta_2\\ \alpha_2&\gamma_2 \end{array}\right] \left[\begin{array}{c} y_{t-2}\\m_{t-2} \end{array}\right] + \left[\begin{array}{c} \epsilon_{1t}\\\epsilon_{2t} \end{array}\right] \\ &\,& \\ \boldsymbol{Y}_t&=&\boldsymbol{\Phi}_1\boldsymbol{Y}_{t-1}+\boldsymbol{\Phi}_2\boldsymbol{Y}_{t-2}+\boldsymbol{\epsilon}_t \end{eqnarray}$$` And, the 'kind of AR(2)' is expressed as follows: `$$\begin{eqnarray} \left[\begin{array}{c} \boldsymbol{Y}_t\\\boldsymbol{Y}_{t-1} \end{array}\right] &=& \left[\begin{array}{cc} \boldsymbol{\Phi}_1&\boldsymbol{\Phi}_2\\ \boldsymbol{I}_2&\boldsymbol{0} \end{array}\right] \left[\begin{array}{c} \boldsymbol{Y}_{t-1}\\\boldsymbol{Y}_{t-2} \end{array}\right] + \left[\begin{array}{c} \boldsymbol{\epsilon}_t\\\boldsymbol{0} \end{array}\right] \\ &\,&\\ \boldsymbol{X}_t &=&\boldsymbol{\Phi}\boldsymbol{X}_{t-1}+\boldsymbol{\xi}_t \end{eqnarray}$$` where, `\(\boldsymbol{\Phi}\)` is a `\(2p\times 2p\)` (or, generally speaking, `\(np\times np\)`) matrix of coefficients to be estimated. --- # Notes on Stationarity and Estimation - VAR **Stationarity.** - Notice that the last equation looks like an AR(1). Thus, it is possible to state conditions, similar to those we reviewed for AR(1), to ensure that the system is stationary. - The eigenvalues of `\(\boldsymbol{\Phi}\)` are inside the unit circle (i.e., `\(|\boldsymbol{I}_2\lambda^2-\boldsymbol{\Phi}_1\lambda-\boldsymbol{\Phi}_2|=\boldsymbol{0}\)`). **Estimation of VARs.** - VAR models typically require a large number of observations due to the number of parameters being estimated. - After determining the appropriate lag length ($p$) and selecting the variables, the coefficients in `\(\boldsymbol{\Phi}\)` are usually estimated using Ordinary Least Squares (OLS) equation by equation. - The optimal lag length can be chosen using model selection criteria like AIC or BIC. --- # Impulse Response Functions (IRFs) Impulse Response Functions (IRFs) are crucial in VAR analysis. They trace the dynamic response of each variable in the system to a one-time shock in one of the variables, holding other shocks constant. **Key Question:** How does an exogenous shock (e.g., a sudden change in monetary policy) propagate through the system and affect other variables (e.g., GDP) over time? **Derivation from Moving Average (MA) Representation:** Any stationary VAR(p) process can be written in its infinite-order vector moving-average representation, `$$\boldsymbol{X}_t = \boldsymbol{\mu} + \sum_{i=0}^\infty \boldsymbol{\Psi}_i \boldsymbol{\xi}_{t-i}$$` where `\(\boldsymbol{\Psi}_i\)` are `\(np \times np\)` matrices of MA coefficients, `\(\boldsymbol{\mu}\)` is a vector of constants, and `\(\boldsymbol{\Psi}_0 = \boldsymbol{I}_{np}\)`. The effect of a shock at time `\(t\)` on the values of `\(\boldsymbol{X}\)` at time `\(t+s\)` (i.e., the impulse response) is given by: `$$\frac{\partial \boldsymbol{X}_{t+s}}{\partial \boldsymbol{\xi}_t} = \boldsymbol{\Psi}_s$$` --- # Impulse Response Functions in Vector Autoregression For our VAR(1) example (derived from a VAR(2)), `$$\frac{\partial \boldsymbol{X}_{t+s}}{\partial \boldsymbol{\xi}_t}=\boldsymbol{\Psi}_s =\boldsymbol{\Phi}^s$$` **Assumptions about Shocks ($\boldsymbol{\xi}_t$):** * **Zero Mean:** `\(\mathbb{E}[\boldsymbol{\xi}_t] = \boldsymbol{0}\)` * **Covariance Matrix:** `\(\mathbb{E}[\boldsymbol{\xi}_t\boldsymbol{\xi}_t'] = \boldsymbol{\Omega}\)` * **No Serial Correlation:** `\(\mathbb{E}[\boldsymbol{\xi}_t\boldsymbol{\xi}_{t-j}'] = \boldsymbol{0}\)` for all `\(j > 0\)` --- # Impulse Response Functions in Vector Autoregression **Problem of Non-Diagonal `\(\boldsymbol{\Omega}\)`:** If `\(\boldsymbol{\Omega}\)` is not diagonal, a shock to one variable's equation will simultaneously be a shock to other variables, making it difficult to isolate the impact of a specific shock. **Solution: Structural Shocks.** We can use a transformation to work with structural shocks ($\boldsymbol{\eta}_t$) that are orthogonal (uncorrelated). **The structural VAR (SVAR)** representation is: `$$\boldsymbol{X}_t = \boldsymbol{\mu} + \sum_{i=0}^\infty \boldsymbol{\Theta}_i \boldsymbol{\eta}_{t-i}$$` where `\(\boldsymbol{\Theta}_i = \boldsymbol{\Psi}_i \boldsymbol{C}\)`, and the structural shocks satisfy: * `\(\mathbb{E}[\boldsymbol{\eta}_t] = \boldsymbol{0}\)` * `\(\mathbb{E}[\boldsymbol{\eta}_t\boldsymbol{\eta}_t'] = \boldsymbol{I}\)` * `\(\mathbb{E}[\boldsymbol{\eta}_t\boldsymbol{\eta}_{t-j}'] = \boldsymbol{0}\)` for all `\(j > 0\)` **Identifying Structural Shocks:** To 'identify' the structural shocks, we need a transformation matrix `\(\boldsymbol{C}\)` such that `\(\boldsymbol{\xi}_t = \boldsymbol{C} \boldsymbol{\eta}_t\)` and `\(\boldsymbol{\Omega} = \boldsymbol{C}\boldsymbol{C}'\)`. --- # Impulse Response Functions (IRFs) in VAR **Approaches for Shock Identification** Several methods exist to identify the structural shocks in a VAR, each with its own assumptions and implications: **Cholesky Decomposition (Recursive Identification)** - Assumes that `\(\boldsymbol{\Omega} = \boldsymbol{C}\boldsymbol{C}'\)`, where `\(\boldsymbol{C}\)` is a lower triangular matrix. - This imposes a recursive structure on the contemporaneous relationships among variables. The ordering of variables in the VAR matters, as it determines the causal ordering assumed by the Cholesky decomposition. Shocks to the first variable can contemporaneously affect all other variables, shocks to the second variable can affect all variables except the first, and so on. **Generalized Impulse Response Functions (Pesaran and Shin)** - This approach does not rely on orthogonalization and provides a unique IRF for each variable, regardless of ordering. - It considers the average effect of a shock to one variable, taking into account the historical correlations among shocks. --- # Impulse Response Functions (IRFs) in VAR **Other Structural Restrictions** - Instead of relying solely on the Cholesky decomposition, we can impose other restrictions based on economic theory to identify the structural shocks. These can include: - **Short-run restrictions:** Restrictions on the contemporaneous impact of shocks (elements of `\(\boldsymbol{C}\)`). - **Long-run restrictions:** Restrictions on the long-run cumulative impact of shocks (elements of the long-run multiplier matrix). - **Sign restrictions:** Restrictions on the signs of certain impulse responses. - **Zero restrictions:** Certain variables do not respond to certain shocks, either contemporaneously or with a lag. **Software Considerations:** - Different software packages may use different default methods for computing IRFs. - Check the identification method used by the software, as the choice of method can significantly affect the results. ??? - useful in cases where the economic theory does not provide a clear guide on the ordering of variables or when the researcher wants to avoid the constraints of a Cholesky decomposition. - Broader applicability: it is considered more robust in empirical applications, as it provides a more realistic representation of how shocks propagate through different variables in a VAR system. - **Comparative Analysis**: Researchers often use this method alongside traditional approaches for comparative purposes, highlighting how sensitive results are to different identification strategies. --- # VAR in Practice using
Consider a bivariate VAR(2) model with GDP growth ( `\(y_t\)` ) and inflation ( `\(\pi_t\)` ): `$$\begin{align} y_t &= \alpha_y + \Phi_{yy1} y_{t-1} + \Phi_{y\pi1} \pi_{t-1} + \Phi_{yy2} y_{t-2} + \Phi_{y\pi2} \pi_{t-2} + \epsilon_{yt} \\ \pi_t &= \alpha_\pi + \Phi_{\pi y1} y_{t-1} + \Phi_{\pi\pi1} \pi_{t-1} + \Phi_{\pi y2} y_{t-2} + \Phi_{\pi\pi2} \pi_{t-2} + \epsilon_{\pi t} \end{align}$$` Simulating the data: ``` r # Simular los datos de PIB e Inflación set.seed(123) nobs <- 200 Phi1 <- matrix(c(0.6, 0.2, 0.1, 0.5), nrow = 2) Phi2 <- matrix(c(0.2, 0.1, 0, 0.3), nrow = 2) error_terms <- matrix(rnorm(nobs * 2), ncol = 2) data_var <- matrix(0, nrow = nobs, ncol = 2) colnames(data_var) <- c("GDP", "Inflation") for (t in 3:nobs) { # Simula el VAR(2) data_var[t, ] <- Phi1 %*% data_var[t-1, ] + Phi2 %*% data_var[t-2, ] + error_terms[t, ] } data_var <- ts(data_var, start = c(2010, 1), frequency = 4) # ``` --- # Example: Estimation VAR(2) in
.pull-left[ ``` r # Estimación de un VAR con dos rezagos library(vars) var_gdp_pi <- VAR(data_var, p=2) # p=2 lag order var_gdp_pi ## ## VAR Estimation Results: ## ======================= ## ## Estimated coefficients for equation GDP: ## ======================================== ## Call: ## GDP = GDP.l1 + Inflation.l1 + GDP.l2 + Inflation.l2 + const ## ## GDP.l1 Inflation.l1 GDP.l2 Inflation.l2 const ## 0.53467502 0.16025020 0.14563481 -0.04062683 0.03844260 ## ## ## Estimated coefficients for equation Inflation: ## ============================================== ## Call: ## Inflation = GDP.l1 + Inflation.l1 + GDP.l2 + Inflation.l2 + const ## ## GDP.l1 Inflation.l1 GDP.l2 Inflation.l2 const ## 0.21880743 0.45582463 0.17474613 0.27510766 0.04199838 ``` ] .pull-right[ ``` r # Selección de rezagos: VARselect(data_var, lag.max = 4, type = "const") ## $selection ## AIC(n) HQ(n) SC(n) FPE(n) ## 2 2 2 2 ## ## $criteria ## 1 2 3 4 ## AIC(n) 0.01106803 -0.07916584 -0.04696552 -0.04433923 ## HQ(n) 0.05169470 -0.01145472 0.04783004 0.07754078 ## SC(n) 0.11141847 0.08808491 0.18718552 0.25671211 ## FPE(n) 1.01113434 0.92390715 0.95417829 0.95675304 ``` ] --- # Example: Estimation VAR(2) in
.pull-left[ ``` r # Función de Impulso Respuesta (IRF) (Non-Orthogonalized) IRF_no <- irf(var_gdp_pi, impulse = "GDP", response = "Inflation", n.ahead = 15, ortho = FALSE, cumulative = FALSE, boot = TRUE, ci = 0.95) plot(IRF_no, main = "Non-Orthogonalized IRF") ``` <img src="econometria_7_files/figure-html/VAR4-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] .pull-right[ ``` r # Función de Impulso Respuesta (IRF) (Orthogonalized) IRF_o <- irf(var_gdp_pi, impulse = "GDP", response = "Inflation", n.ahead = 15, ortho = TRUE, cumulative = FALSE, boot = TRUE, ci = 0.95) plot(IRF_o, main = "Orthogonalized IRF") ``` <img src="econometria_7_files/figure-html/VAR5-1.png" width="90%" height="90%" style="display: block; margin: auto;" /> ] --- # Local Projections VAR models, while powerful, rely on a specific model structure. Misspecification of this structure can lead to biased IRF estimates. **Local Projections** are a flexible statistical method used to estimate impulse response functions (IRFs). Introduced by Jordà (2005), offer an alternative to traditional VAR models, especially when the dynamic relationships are complex or potentially misspecified. The main idea is to project the future value of a variable onto the current value of a shock and, optionally, other control variables, without imposing a specific dynamic structure. LPs provide a more direct and potentially robust way to estimate the response of a variable to a shock over different horizons. **Advantages:** - **Robustness:** Less sensitive to model misspecification compared to VARs. - **Flexibility:** Can more easily handle nonlinearities and structural breaks. - **Direct Interpretation:** Estimated coefficients directly represent the cumulative impulse response at each horizon. --- # Local Projections vs. VAR | Feature | Local Projections | VAR Models | | :------------------ | :----------------------------------------------------- | :-------------------------------------------------------------------------- | | **Estimation** | Separate regressions for each horizon | Joint estimation of a system of equations | | **IRF Computation** | Coefficients directly represent IRFs | IRFs derived from estimated VAR coefficients (e.g., using Cholesky) | | **Structure** | No imposed dynamic structure | Assumes a fixed dynamic structure across all horizons | | **Misspecification** | More robust | Less robust, sensitive to incorrect lag or variable selection | | **Nonlinearity** | Easier to incorporate | More difficult to incorporate, often requires transformations or extensions | | **Efficiency** | Potentially less efficient if VAR is correctly specified | More efficient if correctly specified | - **Common Applications (LP):** 1. **Macroeconomic Forecasting:** Estimating the dynamic effects of various shocks (e.g., monetary policy, fiscal policy, oil price shocks) on macroeconomic variables (e.g., GDP growth, inflation, unemployment). 2. **Policy Evaluation:** Assessing the impact of policy interventions over time. --- # Local Projections In general, **Advantages and Disadvantages of Local Projections** - **Advantages:** 1. **Robustness:** Less sensitive to misspecification of the underlying dynamic relationships. This is particularly important when the true data generating process is unknown or complex. 2. **Flexibility:** Can accommodate nonlinear relationships and structural breaks more easily than standard VARs. This can be done by including interaction terms or using non-parametric methods. 3. **Simplicity and Transparency:** The estimated `\(\beta_h\)` coefficients have a direct interpretation as the cumulative effect of the shock on the outcome variable at horizon `\(h\)`. 4. **Ease of Implementation:** Estimation is straightforward using OLS for each horizon. --- # Local Projections In general, **Advantages and Disadvantages of Local Projections** - **Disadvantages:** 1. **Potential Efficiency Loss:** If the true data generating process is a VAR and the VAR is correctly specified, then VAR estimation will generally be more efficient than LPs. 2. **Decreasing Sample Size:** The effective sample size shrinks as the horizon `\(h\)` increases, which can lead to wider confidence intervals and less precise estimates at longer horizons. 3. **Potential Multicollinearity:** If the shock variable and control variables are highly correlated, multicollinearity can be a problem, especially when including many lags in `\(Z_t\)`. 4. **Serial Correlation:** The error terms `\(\varepsilon_{t+h}\)` are typically serially correlated, especially at longer horizons. It's crucial to use HAC standard errors to account for this. --- # Local Projections **General Specification.** For a given horizon `\(h\)`, the basic LP model is: `$$y_{t+h} = \alpha_h + \beta_h x_t + \gamma_h Z_t + \varepsilon_{t+h}$$` where `\(y_{t+h}\)` is the outcome variable at time `\(t+h\)` (e.g., GDP growth `\(h\)` periods ahead); `\(x_t\)` is the shock or variable of interest at time `\(t\)` (e.g., a monetary policy shock); `\(Z_t\)` is a vector of optional control variables at time `\(t\)` (e.g., lagged values of `\(y_t\)` and `\(x_t\)`, or other relevant predictors); `\(\beta_h\)` is the estimated coefficient, representing the cumulative response of `\(y\)` to `\(x\)` at horizon `\(h\)`; `\(\alpha_h\)` is the intercept for horizon `\(h\)`; `\(\gamma_h\)` is a vector of coefficients for the control variables at horizon `\(h\)`; and `\(\varepsilon_{t+h}\)` is the error term at time `\(t+h\)`. **Estimation Procedure:** 1. Estimate a separate regression for each horizon `\(h = 0, 1, 2, \dots, H\)`, where `\(H\)` is the maximum horizon of interest. 2. The sequence of estimated coefficients `\(\{\hat{\beta}_h\}_{h=0}^H\)` constitutes the estimated impulse response function (IRF). --- # Local Projections **Estimation via Ordinary Least Squares** - **Estimation Strategy:** - For each horizon `\(h\)`, estimate the LP equation using Ordinary Least Squares (OLS): `$$\hat{\beta}_h = \text{arg min}_{\beta_h} \sum_{t=1}^{T-h} (y_{t+h} - \alpha_h - \beta_h x_t - \gamma_h Z_t)^2$$` where, `\(T\)` is the total number of observations. - **Standard Errors:** - The error term `\(\varepsilon_{t+h}\)` is likely to be serially correlated, especially at longer horizons. - Use heteroskedasticity and autocorrelation consistent (HAC) standard errors (e.g., Newey-West) to obtain valid inference. ??? - **Practical Considerations:** - The sample size effectively shrinks as the horizon `\(h\)` increases. This can lead to less precise estimates at longer horizons. - While LPs are more robust to misspecification than VARs, they might be less efficient if the true data generating process is a VAR and the VAR is correctly specified. --- # Local Projections: Extensions 1. **Nonlinear Local Projections:** Allow for nonlinear effects by interacting the shock variable with other variables or by using threshold models. For example: `$$y_{t+h} = \alpha_h + \beta_{1h} x_t + \beta_{2h} x_t I(z_t > c) + \gamma_h Z_t + \varepsilon_{t+h}$$` where `\(I(\cdot)\)` is an indicator function, `\(z_t\)` is a threshold variable, and `\(c\)` is a threshold value. 2. **Panel Local Projections:** Extend the LP framework to panel data settings, allowing for the estimation of impulse responses in the presence of cross-sectional heterogeneity. Can incorporate fixed effects or other panel data techniques. 3. **Bayesian Local Projections:** Introduce Bayesian methods to estimate LPs, which can be useful for incorporating prior information and obtaining more robust inference. 4. **Factor Local Projections:** Apply local projections to a factor-augmented model, in order to perform structural analysis while considering the information contained in a large number of variables. 5. **Semiparametric Local Projections:** Estimate the relationship between `\(y_{t+h}\)` and `\(x_t\)` nonparametrically, allowing for greater flexibility. --- # Local Projections in
.pull-left[ ``` r # Parte 1: lag_fn <- function(data, max_lag) { data_lagged <- data for (i in 1:max_lag) { lag_temp <- data for (j in 1:ncol(data)){ lag_temp[,j] <- c(rep(NA,i), head(data[,j], nrow(data) - i)) } colnames(lag_temp) <- paste0(colnames(data), "_lag", i) data_lagged <- cbind(data_lagged, lag_temp) } return(data_lagged) } p <- 2 # lags para control variables data_lp <- lag_fn(as.data.frame(data_var),p)#Data VAR hh <- 10 # periodos IRFs irf_lp_gdp <- numeric(hh) # Pa' comparar luego con VAR (IRFs for GDP shock on Inflation) irf_var_gdp <- irf(var_gdp_pi, impulse = "GDP", response = "Inflation", n.ahead = hh - 1, ortho = TRUE, cumulative = FALSE, boot = TRUE, ci = 0.95)$irf$GDP ``` ] .pull-right[ ``` r # Parte 2: for (h in 0:(hh - 1)) { y_future <- c(data_lp$Inflation[(1+h):(nrow(data_lp))], rep(NA, h)) formula <- paste("y_future ~ GDP", paste(paste0("GDP_lag",1:p),collapse="+"), paste(paste0("Inflation_lag", 1:p), collapse = " + "), sep=" + ") fit <- lm(as.formula(formula), data = data_lp) coef_test <- coeftest(fit, vcov = vcovHAC(fit)) irf_lp_gdp[h + 1] <- coef_test["GDP", "Estimate"] } irf_lp_inf <- numeric(hh) # IRFs Inflat. shock on GDP for (h in 0:(hh - 1)) { y_future <- c(data_lp$GDP[(1+h):(nrow(data_lp))], rep(NA, h)) formula <- paste("y_future ~ Inflation", paste(paste0("GDP_lag",1:p),collapse=" + "), paste(paste0("Inflation_lag",1:p),collapse= " + "), sep=" + ") fit <- lm(as.formula(formula), data = data_lp) coef_test <- coeftest(fit, vcov = vcovHAC(fit)) irf_lp_inf[h + 1] <- coef_test["Inflation", "Estimate"] } ``` ] --- # Local Projections in
<img src="econometria_7_files/figure-html/LP4-1.png" width="62%" height="62%" style="display: block; margin: auto;" /> --- # Cierre </br></br></br> ## <center>¿Preguntas?</center> .center[ ] `$$\,$$` .center[O vía E-mail: [lchanci1@binghamton.edu](mailto:lchanci1@binghamton.edu)]