Asymptotic Theory

Magíster en Economía
Teoría Econométrica (Econometric Theory)

Prof. Luis Chancí

www.luischanci.com

Outline

Modes of Convergence
four ways a random sequence can converge to a limit
Laws of Large Numbers
sample averages converge to population means
Central Limit Theorems
normalized averages are approximately normal
OLS Asymptotics
first, some auxiliary tools
then, consistency + asymptotic normality
finally, robust variance estimation

In general, our goal is to replace exact finite-sample results with approximate large-sample results, without normality assumptions.

1. Modes of Convergence

(four ways a random sequence can converge to a limit)

Random Convergence - Intro.

For each sample size \(n\), \(\hat{\theta}_n\) is a random variable (it takes different values across samples). Regular deterministic convergence (e.g., \(a_n \to a\)) requires a single sequence of numbers, not a family of random variables. This is the problem with random convergence.

So, we need probabilistic notions of “getting close to \(\theta\)”:

How likely is it that \(\hat{\theta}_n\) is far from \(\theta\)?
Does the probability of a “bad” estimate shrink as \(n\) grows?
Does every realization of the sequence eventually get close?

Convergence in Probability

Definition 2.1 — Convergence in Probability

\[\lim_{n\to\infty} P\bigl(|Z_n - c| > \varepsilon\bigr) = 0 \quad \forall\, \varepsilon > 0\]

Written as: \(Z_n \xrightarrow{\,p\,} c\) or \(\operatorname{plim}_{n\to\infty} Z_n = c\)

Intuition: for any tolerance \(\varepsilon\), the probability that \(Z_n\) lies more than \(\varepsilon\) away from \(c\) goes to zero as \(n\) grows.

This is the main notion behind consistency.

Two Stronger Modes

Definition 2.2 — Almost Sure Convergence

\[P\!\left(\lim_{n\to\infty} Z_n = c\right) = 1\]

Written: \(Z_n \xrightarrow{\,a.s.\,} c\)

Concerns the path of the sequence — with probability 1, every realization eventually converges.

Definition 2.3 — Mean Square Convergence

\[\lim_{n\to\infty} \mathbb{E}\!\left[(Z_n - c)^2\right] = 0\]

Written: \(Z_n \xrightarrow{\,m.s.\,} c\)

For the sample mean: \(\mathbb{E}[(\bar{X}_n-\mu)^2] = \sigma^2/n \to 0\).

Convergence in Distribution

Definition 2.4 — Convergence in Distribution

\[\lim_{n\to\infty} F_{Z_n}(z) = F_Z(z) \quad \text{at all continuity points of } F_Z\]

Written: \(Z_n \xrightarrow{\,d\,} Z\)

Weakest of the four modes — only requires the CDFs to converge, not the random variables themselves
Classic CLT example: \(\dfrac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} \xrightarrow{\,d\,} \mathcal{N}(0,1)\)

When \(Z = c\) is a constant: \(Z_n \xrightarrow{\,d\,} c \;\Leftrightarrow\; Z_n \xrightarrow{\,p\,} c\)

Hierarchy of Convergence Modes

a.s.	⟹	in prob.	⟹	in dist.
m.s.	⟹	in prob.

None of the reverse implications hold in general.

\(Z_n \xrightarrow{\,a.s.\,} c \Rightarrow Z_n \xrightarrow{\,p\,} c\)
\(Z_n \xrightarrow{\,m.s.\,} c \Rightarrow Z_n \xrightarrow{\,p\,} c\)
\(Z_n \xrightarrow{\,p\,} c \Rightarrow Z_n \xrightarrow{\,d\,} c\)

Special case worth remembering: \(Z_n \xrightarrow{\,d\,} c \quad \Longleftrightarrow \quad Z_n \xrightarrow{\,p\,} c\) when the limit is a constant.

Stochastic Order: (Little) \(o_p\) and (Big) \(O_p\)

Definition 2.5 — Stochastic Order Notation

\(Z_n=o_p(1)\) if \(Z_n \xrightarrow{\,p\,} 0\)
\(Z_n=O_p(1)\) if \(\{Z_n\}\) is bounded in probability, that is, for every \(\varepsilon>0\), there exist \(M<\infty\) and \(N\) such that \(P(|Z_n|>M)<\varepsilon\) for all \(n>N\)

In general, \(Z_n = o_p(r_n)\) means \(Z_n/r_n \xrightarrow{\,p\,} 0\); \(\quad Z_n = O_p(r_n)\) means \(Z_n/r_n = O_p(1)\)

Some key arithmetic are: \(o_p(1)+O_p(1) = O_p(1)\) and \(o_p(1)\cdot O_p(1)=o_p(1)\).

We will see (OLS rate) that \((\hat{\beta}-\beta) = O_p(n^{-1/2})\), meaning that errors shrink at the \(\sqrt{n}\) rate.

2. Laws of Large Numbers

(sample averages converge to population means)

Laws of Large Numbers — Why They Work

When we average \(n\) i.i.d. draws, idiosyncratic noise cancels:

Positive and negative deviations from \(\mu\) tend to offset each other
The variance of the average shrinks: \(\operatorname{Var}(\bar{X}_n) = \sigma^2/n \to 0\)

Chebyshev’s Inequality

\[P(|Z - \mu| \geq \varepsilon) \;\leq\; \frac{\sigma^2}{\varepsilon^2}\]

This bounds the probability of large deviations using only the variance (again, no distributional assumptions required).

Weak LLN (Chebyshev)

Theorem 3.1 — WLLN (Chebyshev)

Let \(\{X_i\}\) be i.i.d. with \(\mathbb{E}[X_i]=\mu\), \(\operatorname{Var}(X_i)=\sigma^2<\infty\). Then: \[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \;\xrightarrow{\,p\,}\; \mu\]

Weak LLN (Chebyshev)

Theorem 3.1 — WLLN (Chebyshev)

Let \(\{X_i\}\) be i.i.d. with \(\mathbb{E}[X_i]=\mu\), \(\operatorname{Var}(X_i)=\sigma^2<\infty\). Then: \[\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \;\xrightarrow{\,p\,}\; \mu\]

Proof.

Proof sketch

Apply Chebyshev to \(\bar{X}_n\): since \(\operatorname{Var}(\bar{X}_n)=\sigma^2/n\), \[P\!\left(|\bar{X}_n - \mu| \geq \varepsilon\right) \;\leq\; \frac{\sigma^2}{n\varepsilon^2} \;\to\; 0\] Equivalently: \(\mathbb{E}[(\bar{X}_n-\mu)^2] = \sigma^2/n \to 0\), so \(\bar{X}_n \xrightarrow{\,m.s.\,} \mu \Rightarrow \bar{X}_n \xrightarrow{\,p\,} \mu\).

WLLN — Khinchine & Multivariate

Theorem 3.2 — Khinchine WLLN

Only requires \(\mathbb{E}[|X_i|]<\infty\) (no finite variance). Then: \[\bar{X}_n \;\xrightarrow{\,p\,}\; \mu\] Applies to heavy-tailed distributions (e.g., Pareto, Cauchy-adjacent).

Theorem 3.3 — Multivariate WLLN

Let \(\{X_i\}\) be i.i.d. \(k\times1\) with \(\mathbb{E}[\|X_i\|]<\infty\). Then: \[\frac{1}{n}\sum_{i=1}^n X_i \;\xrightarrow{\,p\,}\; \mu\] Corollary (Hansen 6.2): \[\frac{1}{n}\sum_{i=1}^n X_iX_i' \xrightarrow{\,p\,} \mathbb{E}[X_iX_i'] = \mathbf{Q}_{XX}\] (requires \(\mathbb{E}[\|X_i\|^2]<\infty\))

Strong LLN (Kolmogorov)

Theorem 3.4 — SLLN (Kolmogorov)

Let \(\{X_i\}\) be i.i.d. with \(\mathbb{E}[|X_i|]<\infty\). Then: \[\bar{X}_n \;\xrightarrow{\,a.s.\,}\; \mu\]

	Condition	Convergence
WLLN (Khinchine)	\(\mathbb{E}[\\|X\\|]<\infty\)	in probability
SLLN (Kolmogorov)	\(\mathbb{E}[\\|X\\|]<\infty\)	almost sure

Same moment condition, but a stronger conclusion.

For most econometric results, either suffices — SLLN is rarely needed beyond WLLN.

3. Central Limit Theorems

(normalized averages are approximately normal)

The Central Limit Theorem — Why It Matters

Even if the \(X_i\) are non-normal, the sample average is approximately normal for large \(n\):

the natural normalization is \(\sqrt{n}\)
variances shrink at rate \(1/n\), so standard errors shrink at rate \(1/\sqrt{n}\)
doubling \(n\) reduces the standard error by a factor of \(\sqrt{2}\), not \(2\)
this is what makes large-sample inference possible without assuming normal regression errors

For OLS, the implication is that we do not need to assume \(u_i \sim \mathcal{N}(\cdot)\) in order to justify inference.

As we will see, asymptotic normality of \(\hat{\beta}\) comes from applying the multivariate CLT to \[ \frac{1}{\sqrt{n}}\sum_{i=1}^n X_i u_i. \]

CLT: Lindeberg-Lévy (i.i.d. case)

Theorem 4.1 — Lindeberg-Lévy CLT

Let \(\{X_i\}\) be i.i.d. with \(\mathbb{E}[X_i]=\mu\), \(\operatorname{Var}(X_i)=\sigma^2\in(0,\infty)\). Then: \[\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \;\xrightarrow{\,d\,}\; \mathcal{N}(0,1)\] or equivalently, \(\sqrt{n}(\bar{X}_n-\mu) \xrightarrow{\,d\,} \mathcal{N}(0,\sigma^2)\)

The \(\sqrt{n}\) rate is important: variances shrink at \(1/n\), so standard deviations at \(1/\sqrt{n}\)
Result holds regardless of the shape of the \(X_i\) distribution (only finite variance needed)
No symmetry, no bounded support, no specific family required

CLT: Non-Identical Distributions

Theorem 4.2 — Liapunov CLT

Independent (not i.d.) \(\{X_i\}\). Let \(s_n^2 = \sum_{i=1}^n\sigma_i^2\). If for some \(\delta>0\): \[\frac{1}{s_n^{2+\delta}}\sum_{i=1}^n \mathbb{E}[|X_i-\mu_i|^{2+\delta}] \to 0\] then \(\dfrac{1}{s_n}\sum(X_i-\mu_i) \xrightarrow{\,d\,} \mathcal{N}(0,1)\).

Theorem 4.3 — Lindeberg-Feller CLT

Same setup. If for every \(\varepsilon>0\): \[\frac{1}{s_n^2}\sum_{i=1}^n \mathbb{E}\!\left[(X_i-\mu_i)^2\cdot\mathbf{1}\{|X_i-\mu_i|>\varepsilon s_n\}\right] \to 0\] then \(\dfrac{1}{s_n}\sum(X_i-\mu_i) \xrightarrow{\,d\,} \mathcal{N}(0,1)\).

Hierarchy: Lindeberg-Lévy \(\subset\) Liapunov \(\subset\) Lindeberg-Feller (in terms of generality)

Multivariate CLT

Theorem 4.4 — Multivariate CLT (Hansen 6.3)

Let \(\{Z_i\}\) be i.i.d. \(k\times1\) with \(\mathbb{E}[Z_i]=\mathbf{0}\) and \(\mathbb{E}[Z_iZ_i']=\Sigma\) (finite, p.d.). Then: \[\frac{1}{\sqrt{n}}\sum_{i=1}^n Z_i \;\xrightarrow{\,d\,}\; \mathcal{N}(\mathbf{0},\,\Sigma)\]

\(\,\)

The application to OLS would be later as follows. Set \(Z_i = X_i u_i\) (the score vector) and then \[\Sigma = \mathbb{E}[X_iX_i'u_i^2] \equiv S \quad \text{(`the sandwich meat')}\] So, the multivariate CLT gives us the limiting distribution of \(\dfrac{1}{\sqrt{n}}\sum X_i u_i\).

4. OLS Asymptotics

4.1. First, some Auxiliary Tools

Continuous Mapping Theorem

Theorem 5.1 — Continuous Mapping Theorem (CMT)

Let \(g:\mathbb{R}^k\to\mathbb{R}^m\) be continuous (a.e. on the support of \(Z\)). Then:

If \(Z_n \xrightarrow{\,p\,} c\) \(\;\Rightarrow\;\) \(g(Z_n) \xrightarrow{\,p\,} g(c)\)
If \(Z_n \xrightarrow{\,d\,} Z\) \(\;\Rightarrow\;\) \(g(Z_n) \xrightarrow{\,d\,} g(Z)\)

Examples:

If \(\hat\sigma^2 \xrightarrow{\,p\,} \sigma^2>0\), then \(1/\hat\sigma^2 \xrightarrow{\,p\,} 1/\sigma^2\)
If \(\dfrac{1}{n}X'X \xrightarrow{\,p\,} \mathbf{Q}_{XX}\) (p.d.), then \(\left(\dfrac{1}{n}X'X\right)^{-1} \xrightarrow{\,p\,} \mathbf{Q}_{XX}^{-1}\). This would be key for OLS

Slutsky’s Theorem

Theorem 5.2 — Slutsky’s Theorem

If \(Z_n \xrightarrow{\,d\,} Z\) and \(A_n \xrightarrow{\,p\,} a\) (a constant), then: \[A_n Z_n \;\xrightarrow{\,d\,}\; aZ \qquad \text{and} \qquad A_n + Z_n \;\xrightarrow{\,d\,}\; a + Z\]

This is useful because it allows us to replace consistent estimators of nuisance parameters with their probability limits without altering the limiting distribution. For example, \(\dfrac{\sqrt{n}(\bar{X}-\mu)}{\hat\sigma} \xrightarrow{\,d\,} \mathcal{N}(0,1)\) since \(\hat\sigma/\sigma \xrightarrow{\,p\,} 1\).

The Delta Method

Theorem 5.3 — Delta Method (Hansen 6.8)

Suppose \(\sqrt{n}(\hat\theta_n - \theta_0) \xrightarrow{\,d\,} \mathcal{N}(\mathbf{0}, V)\). Let \(g:\mathbb{R}^k\to\mathbb{R}^m\) be \(C^1\) at \(\theta_0\), with Jacobian \(G = \left.\dfrac{\partial g}{\partial\theta'}\right|_{\theta_0}\). Then: \[\sqrt{n}\bigl(g(\hat\theta_n) - g(\theta_0)\bigr) \;\xrightarrow{\,d\,}\; \mathcal{N}(\mathbf{0},\; GVG')\]

The Delta Method

Theorem 5.3 — Delta Method (Hansen 6.8)

Proof.

Proof sketch

By first-order Taylor expansion: \(g(\hat\theta_n) - g(\theta_0) \approx G(\hat\theta_n-\theta_0)\). Multiply by \(\sqrt{n}\) and apply Slutsky.

For example, if \(g(\mu_1,\mu_2) = \ln(\mu_1/\mu_2)\), the Jacobian is \(G = (1/\mu_1,\,-1/\mu_2)\). Therefore, this provides the asymptotic SE for the log-ratio of two sample means (e.g., a log-wage ratio between two groups)..

4.2. consistency + asymptotic normality + feasible robust inference

The Linear Model

\[y_i = X_i'\beta + u_i, \qquad i = 1, \ldots, n\]

\(X_i\) is \(k\times1\), \(\beta\in\mathbb{R}^k\), \(u_i\) is the structural error.

Assumption 6.1 (Hansen 7.1)

\(\{y_i,X_i\}\) are i.i.d., \(\mathbb{E}[X_i u_i]=\mathbf{0}\) (exogeneity), \(\mathbf{Q}_{XX} = \mathbb{E}[X_iX_i']\) is finite and p.d., \(\mathbb{E}[u_i^2]<\infty\).

Compared to Gauss-Markov:

No homoskedasticity required
No normality of \(u_i\) required
Mean exogeneity \(\mathbb{E}[X_i u_i]=\mathbf{0}\) (weaker than the conditional-mean assumptions used for exact finite-sample results)

OLS as a Function of Sample Moments

Thus, for \(\hat{\beta}=(X'X)^{-1}(X'Y)\), we have

\[\hat{\beta} = \left(\frac{1}{n}\sum_{i=1}^n X_iX_i'\right)^{-1}\left(\frac{1}{n}\sum_{i=1}^n X_iy_i\right)\]

Substituting \(y_i = X_i'\beta + u_i\):

\[\hat{\beta} = \beta + \underbrace{\left(\frac{1}{n}\sum_{i=1}^n X_iX_i'\right)^{-1}}_{\xrightarrow{\,p\,}\;\mathbf{Q}_{XX}^{-1}} \underbrace{\left(\frac{1}{n}\sum_{i=1}^n X_iu_i\right)}_{\xrightarrow{\,p\,}\;\mathbf{0}}\]

\(\hat\beta\) is a continuous function of two sample averages.

Consistency: Three-Step Proof

Theorem 6.1 — Consistency (Hansen 7.1)

Under Assumption 6.1: \(\hat{\beta} \xrightarrow{\,p\,} \beta\)

Proof.

Proof sketch

(WLLN) By multivariate WLLN: \(\dfrac{1}{n}\sum X_iX_i' \xrightarrow{\,p\,} \mathbf{Q}_{XX}\) and \(\dfrac{1}{n}\sum X_iu_i \xrightarrow{\,p\,} \mathbf{0}\)
(CMT — inverse) Since \(\mathbf{Q}_{XX}\) is p.d., by CMT: \(\left(\dfrac{1}{n}\sum X_iX_i'\right)^{-1} \xrightarrow{\,p\,} \mathbf{Q}_{XX}^{-1}\)
(CMT — product) Since multiplication is continuous: \(\hat\beta - \beta \xrightarrow{\,p\,} \mathbf{Q}_{XX}^{-1}\cdot\mathbf{0} = \mathbf{0}\)

No normality. No homoskedasticity. Just LLN + CMT.

Fourth Moments and the Sandwich Meat

Assumption 6.2 (Hansen 7.2)

\(\mathbb{E}[\|X_i\|^4] < \infty\) and \(\mathbb{E}[u_i^4] < \infty\)

This ensures \(\mathbb{E}[\|X_iu_i\|^2] = \mathbb{E}[X_iX_i'u_i^2]\) is finite. Define the sandwich meat:

\[S \;=\; \mathbb{E}\!\left[X_iX_i'u_i^2\right]\]

\(S\) captures heteroskedasticity in the score — the variance of the estimating equation.

Under homoskedasticity \(\mathbb{E}[u_i^2|X_i] = \sigma^2\):

\[S = \mathbb{E}[X_iX_i'u_i^2] = \sigma^2\mathbb{E}[X_iX_i'] = \sigma^2\mathbf{Q}_{XX}\]

Otherwise, \(S\) has a more complex structure that depends on \(\mathbb{E}[u_i^2|X_i]\).

Asymptotic Normality

Theorem 6.2 — Asymptotic Normality (Hansen 7.3)

Under Assumptions 6.1 and 6.2: \[\sqrt{n}\,(\hat{\beta} - \beta) \;\xrightarrow{\,d\,}\; \mathcal{N}(\mathbf{0},\; V_\beta)\] where the sandwich variance is: \[\boxed{V_\beta = \mathbf{Q}_{XX}^{-1}\, S\, \mathbf{Q}_{XX}^{-1}, \qquad S = \mathbb{E}[X_iX_i'u_i^2]}\]

The interpretation is that \(\hat\beta\) is approximately \(\mathcal{N}\!\left(\beta,\, n^{-1}V_\beta\right)\) in large samples (without assuming normal errors).

Proof of Asymptotic Normality

Proof.

Proof sketch

Multiply through by \(\sqrt{n}\): \[\sqrt{n}(\hat\beta - \beta) = \underbrace{\left(\frac{1}{n}\sum X_iX_i'\right)^{-1}}_{\xrightarrow{\,p\,}\;\mathbf{Q}_{XX}^{-1}} \cdot \underbrace{\frac{1}{\sqrt{n}}\sum X_iu_i}_{\xrightarrow{\,d\,}\;\mathcal{N}(\mathbf{0},\,S)}\]

Step 1: First factor \(\xrightarrow{\,p\,} \mathbf{Q}_{XX}^{-1}\) by WLLN + CMT (same as consistency proof).

Proof of Asymptotic Normality

Proof.

Proof sketch (cont.)

Step 2: \(\{X_iu_i\}\) are i.i.d., mean zero (by exogeneity), variance \(S\) (finite by Assm 6.2). By the multivariate CLT: \[\frac{1}{\sqrt{n}}\sum X_iu_i \xrightarrow{\,d\,} \mathcal{N}(\mathbf{0},S)\]

Step 3: By Slutsky: the product converges \(\xrightarrow{\,d\,} \mathbf{Q}_{XX}^{-1}\cdot\mathcal{N}(\mathbf{0},S) = \mathcal{N}(\mathbf{0},\,\mathbf{Q}_{XX}^{-1}S\mathbf{Q}_{XX}^{-1})\).

Homoskedastic (A Special Case)

Under \(\mathbb{E}[u_i^2\mid X_i]=\sigma^2\), we have \[ S = \mathbb{E}[X_iX_i'u_i^2] = \sigma^2\mathbb{E}[X_iX_i'] = \sigma^2\mathbf{Q}_{XX}. \]

Substituting into the sandwich formula, \[ V_\beta^{hom} = \mathbf{Q}_{XX}^{-1} (\sigma^2\mathbf{Q}_{XX}) \mathbf{Q}_{XX}^{-1} = \sigma^2\mathbf{Q}_{XX}^{-1}. \]

Therefore, \[ \sqrt{n}(\hat{\beta}-\beta) \xrightarrow{\,d\,} \mathcal{N}(\mathbf{0},\,\sigma^2\mathbf{Q}_{XX}^{-1}). \]

This is exactly the large-sample counterpart of the classical finite-sample variance formula from the OLS notes.

4.3. Robust variance estimation

Estimating the Sandwich Variance

To use \(\sqrt{n}(\hat\beta-\beta) \xrightarrow{\,d\,} \mathcal{N}(\mathbf{0},V_\beta)\) for inference, we need \(\hat{V}_\beta \xrightarrow{\,p\,} V_\beta\).

Two components to estimate:

\(\hat{\mathbf{Q}}_{XX} = \dfrac{1}{n}X'X \xrightarrow{\,p\,} \mathbf{Q}_{XX}\) (by WLLN — straightforward)
\(S = \mathbb{E}[X_iX_i'u_i^2]\) is `harder’ as \(u_i\) are unobserved.

Solution: We first replace \(u_i\) with OLS residuals \(\hat u_i = y_i - X_i'\hat\beta\)

Homoskedastic case: \(\hat{V}^{hom} = \hat\sigma^2(X'X/n)^{-1}\), where \(\hat\sigma^2 = \dfrac{1}{n-k}\sum \hat u_i^2\)
Heteroskedastic case: need a robust (sandwich) estimator

White’s Heteroskedasticity-Robust Estimator

Theorem 7.1 — Robust (White) Variance Estimator (Hansen 7.6)

\[\hat{V}_\beta^{rob} = \hat{\mathbf{Q}}_{XX}^{-1}\,\hat{S}_{rob}\,\hat{\mathbf{Q}}_{XX}^{-1}, \qquad \hat{S}_{rob} = \frac{1}{n}\sum_{i=1}^n X_iX_i'\hat u_i^2\]

Under Assumptions 6.1–6.2: \(\hat{V}_\beta^{rob} \xrightarrow{\,p\,} V_\beta\).

Estimated covariance of \(\hat\beta\) is \(\widehat{\operatorname{Avar}}(\hat\beta) = n^{-1}\hat{V}_\beta^{rob}\)

The \(t\)-statistic is: \(t_j = \hat\beta_j / \widehat{se}(\hat\beta_j) \xrightarrow{\,d\,} \mathcal{N}(0,1)\) under \(H_0:\beta_j=0\)

This estimator is consistent but may be biased downward in small samples — residuals \(\hat u_i\) underestimate errors \(u_i\) (because OLS minimizes the sum of squared residuals).

Feasible Inference (Slutsky)

Asymptotic normality becomes usable once we plug in a consistent variance estimator.

If \(\hat V_\beta \xrightarrow{\,p\,} V_\beta\), then Slutsky implies

\[ \frac{\sqrt{n}(\hat\beta_j-\beta_{j,0})}{\sqrt{[\hat V_\beta]_{jj}}} \xrightarrow{\,d\,} \mathcal{N}(0,1) \qquad (H_0:\beta_j=\beta_{j,0}). \]

The estimated asymptotic variance of \(\hat\beta\) itself is \(n^{-1}\hat V_\beta\).

The main takeaway is: robust standard errors are justified by consistency of \(\hat V_\beta\) (not by normality of \(u_i\)).

Dependence: HAC and Cluster-Robust SE

Heteroskedasticity-robust SE treat observations as (conditionally) uncorrelated. When there is dependence, the ``meat’’ must change.

HAC (time-series dependence): long-run variance of the score \(X_i u_i\)

\[ S = \Gamma_0 + \sum_{\ell=1}^{\infty}(\Gamma_\ell+\Gamma_\ell') \qquad \Gamma_\ell = \mathbb{E}[X_i u_i\,u_{i-\ell}X_{i-\ell}']. \]

Cluster-robust (within-group dependence): allow arbitrary correlation within cluster \(g\), independence across clusters

\[ \widehat V_{\beta,CR} =(X'X)^{-1}\Bigl(\sum_{g=1}^G X_g'\hat u_g\hat u_g'X_g\Bigr)(X'X)^{-1} \]

Key asymptotic condition: for cluster-robust inference, we typically require the number of clusters \(G\to\infty\).

Summary

What We Have Proved

Consistency, \(\hat{\beta}\xrightarrow{\,p\,}\beta\), by LLN + CMT, under i.i.d. sampling, exogeneity, and finite second moments.
Asymptotic normality, \(\sqrt{n}(\hat{\beta}-\beta) \xrightarrow{\,d\,} \mathcal{N}(\mathbf{0},V_\beta)\) by CLT + Slutsky, with finite fourth moments.
Sandwich variance, \(V_\beta=\mathbf{Q}_{XX}^{-1}S\mathbf{Q}_{XX}^{-1}\), which remains valid under heteroskedasticity.
Homoskedastic special case, \(V_\beta=\sigma^2\mathbf{Q}_{XX}^{-1}\), so the classical variance formula is recovered asymptotically.

Large-sample theory replaces exact finite-sample normality with an approximation that becomes more accurate as \(n\) grows.

Looking Ahead

GLS & FGLS: exploit \(\mathbb{E}[u_i^2|X_i]=\sigma^2(X_i)\) for efficiency gains — asymptotically efficient estimator under heteroskedasticity
MLE: Cramér-Rao bound; asymptotic efficiency of MLE; connection to OLS under normality
IV & GMM: extend the asymptotic argument to endogenous regressors using instruments

The full workflow:

\[\underbrace{\text{LLN}}_{\text{consistency}} \;\longrightarrow\; \underbrace{\text{CLT}}_{\text{normality}} \;\longrightarrow\; \underbrace{\text{CMT/Slutsky/Delta}}_{\text{inference}} \;\longrightarrow\; \underbrace{\text{Robust SE}}_{\text{feasible inference}}\]

Cierre

¿Preguntas?

\[\,\]

O vía E-mail: luis.chanci@usach.cl