Relationship between variables represented by a Linear Model
For instance, summarize the predictive power of schooling’s effect on wages with the conditional expectation function (CEF):
E[log(Salary)i|Xi]=β0+β1Edui+βkExperi
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 5.503 | 0.112 | 49.12 | 8.126e-261 |
educ | 0.07778 | 0.006577 | 11.83 | 3.616e-30 |
exper | 0.01978 | 0.003303 | 5.988 | 3.022e-09 |
Add some assumptions, and then
After midterm 1, some assumptions may fail, for instance,
One relevant part we reviewed was:
E(ˆβ1|X)≡β1+β2Cov(X,OmittedPart)Var(X)
(there are many correlations that are not causal)
How do we know if X causes Y?
E.g., causal relationships
Discussion
Pick up one observation: i=Luis
Crazy idea, assume there are two parallel universes (PU):
EVERYTHING is the same, EXCEPT Schooling.
Hence, we can conclude with respect to differences in earnings.
In practice there is one issue,
a prediction of what would have happened in the absence of the treatment.
Notions: Treatment, Control, Experiment, Quasi-Experiment, others.
Methods:
Let’s use a dummy variable for whether an observation, i, received a treatment or not:
Treatment: Di=1
Control: Di=0
There is an ACTUAL outcome: Yi
and an UNOBSERVED counterfactual:
Unobservedoutcome={Y1iifDi=1Y0iifDi=0
We would like to compare two observations that are basically exactly the same except that one has D=0 and one has D=1.
Experiments:
Randomized Controlled Trial, RCT
The econometric specification could be as simple as,
Yi=β0+β1Di+ui which is the same as a two-sample t-test.
On the other hand, there are some challenges to consider. For instance,
Experiments and Quasi-experiments:
In an experiment, participants are randomly assigned to treatment or control.
In a quasi-experiment, observations are not assigned randomly.
Thus,
Groups may differ not only in treatment
(need to statistically control for differences).
There may be several “rival hypotheses” competing with the experimental manipulation as explanations for observed results.
Idea,
library(foreign)
Panel <- read.dta("http://dss.princeton.edu/training/Panel101.dta") # Source: http://dss.princeton.edu/training, visited Apr 2019.
country | year | y | y_bin | x1 | x2 | x3 | opinion |
---|---|---|---|---|---|---|---|
A | 1990 | 1342787840 | 1 | 0.2779036 | -1.1079559 | 0.2825536 | Str agree |
A | 1991 | -1899660544 | 0 | 0.3206847 | -0.9487200 | 0.4925385 | Disag |
A | 1992 | -11234363 | 0 | 0.3634657 | -0.7894840 | 0.7025234 | Disag |
A | 1993 | 2645775360 | 1 | 0.2461440 | -0.8855330 | -0.0943909 | Disag |
A | 1994 | 3008334848 | 1 | 0.4246230 | -0.7297683 | 0.9461306 | Disag |
A | 1995 | 3229574144 | 1 | 0.4772141 | -0.7232460 | 1.0296804 | Str agree |
A | 1996 | 2756754176 | 1 | 0.4998050 | -0.7815716 | 1.0922881 | Disag |
A | 1997 | 2771810560 | 1 | 0.0516284 | -0.7048455 | 1.4159008 | Str agree |
A | 1998 | 3397338880 | 1 | 0.3664108 | -0.6983712 | 1.5487227 | Disag |
A | 1999 | 39770336 | 1 | 0.3958425 | -0.6431540 | 1.7941980 | Str disag |
B | 1990 | -5934699520 | 0 | -0.0818500 | 1.4251202 | 0.0234281 | Agree |
B | 1991 | -711623744 | 0 | 0.1061600 | 1.6496018 | 0.2603625 | Str agree |
B | 1992 | -1933116160 | 0 | 0.3537852 | 1.5937191 | -0.2343988 | Agree |
B | 1993 | 3072741632 | 1 | 0.7267770 | 1.6917576 | 0.2562243 | Str disag |
B | 1994 | 3768078848 | 1 | 0.7193949 | 1.7414261 | 0.4117495 | Disag |
B | 1995 | 2837581312 | 1 | 0.6715466 | 1.7083139 | 0.5358430 | Str disag |
B | 1996 | 577199360 | 1 | 0.8198573 | 1.5324961 | -0.4996490 | Str agree |
B | 1997 | 1786851584 | 1 | 0.8801672 | 1.5021962 | -0.5762677 | Disag |
B | 1998 | -149072048 | 0 | 0.7045161 | 1.4236463 | -0.4484192 | Agree |
B | 1999 | -1174480128 | 0 | 0.2369673 | 1.4545859 | -0.0493640 | Str disag |
For instance, the model could be like
yit=β0+β1xit+θi⏟FE+uit
where, θ could represent (unobserved) characteristic(s)
If we ignore θi,
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 1.524e+09 | 621072624 | 2.454 | 0.01668 |
x1 | 4.95e+08 | 778861261 | 0.6355 | 0.5272 |
(not stat. significant)
F.E. Estimation
Dummy variable regression, yit=αADA+αBDB+...+β1xit+uit
Within Estimator, (yit−ˉyi)on(xit−ˉxi)
Dummy Variable Regression,
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
x1 | 2.476e+09 | 1.107e+09 | 2.237 | 0.02889 |
factor(country)A | 880542404 | 961807052 | 0.9155 | 0.3635 |
factor(country)B | -1.058e+09 | 1.051e+09 | -1.006 | 0.3181 |
factor(country)C | -1.723e+09 | 1.632e+09 | -1.056 | 0.2951 |
factor(country)D | 3.163e+09 | 909459150 | 3.478 | 0.0009303 |
factor(country)E | -602622000 | 1.064e+09 | -0.5662 | 0.5733 |
factor(country)F | 2.011e+09 | 1.123e+09 | 1.791 | 0.07821 |
factor(country)G | -984717493 | 1.493e+09 | -0.6597 | 0.5119 |
(stat. significant)
Dummy Variable Regression,
Within Estimator,
(yit−ˉyi)=β1(xit−ˉxi)+ϵit
library(plm)
FE <- plm(y ~ x1, data=Panel, index=c("country", "year"), model="within")
pander(coefficients(summary(FE)))
Estimate | Std. Error | t-value | Pr(>|t|) | |
---|---|---|---|---|
x1 | 2.476e+09 | 1.107e+09 | 2.237 | 0.02889 |
(stat. significant)
Discussion (from Wooldridge)
Strict exogeneity in the original model has to be assumed
In the case T = 2, fixed effects and first differencing are identical
For T > 2, fixed effects is more efficient if classical assumptions hold
If T is very large (and N not so large), the panel has a pronounced time series character and problems such as strong dependence arise
First differencing may be better in the case of severe serial correlation in the errors
Motivation
Doctor John Snow, physician. London, 1813-1858.
Cholera,
Jhon noticed variations
1849:
1852: Lambeth Waterworks moved their intake upriver
1853: London has another cholera outbreak
Do you want to learn more?
Illustration,
Econometric specification,
y=β0+β1Dtreatment+β2Dpost+β3(Dtreatment∗Dpost)+u
β3 is the parameter of interest.
Notices that this is related to FE. For instance, if we have more than two groups (T/C = g) and more than two periods (Pre/Post = t):
yigt=β0+β1Dg+β2Dt+β3Dgt+β4Xigt+uigt
Key Assumptions:
Time affected the Treatment and Control groups equally
Do you want to learn more?
Idea,
For the linear model, y=β0+β1x+u,
say that we do not have random assignment
but, what if we can find another variable z with random assignment that causes X
if this is the case, we coud get a natural experiment!
This variable is called it an “instrument” or “instrumental variable”
Conditions for instrumental variable:
In other words,
There is an endogeneity issue, Cov(x,u)≠0
but, what if we can find z, such as this variable
Cov(z,x)≠0
Cov(z,u)=0
Before moving on, check
Estimation for k=2, y=α+βx+u.
IfCov(z,u)=0,⇒Cov(z,y)−βCov(z,x)=0
thus,
ˆβIV=∑(zi−ˉz)(yi−ˉy)∑(zi−ˉz)(xi−ˉx)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 4.769 | 0.005369 | 888.3 | 0 |
packs | -0.08981 | 0.01698 | -5.29 | 1.422e-07 |
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 4.448 | 0.9082 | 4.898 | 1.082e-06 |
packs | 2.989 | 8.699 | 0.3436 | 0.7312 |
Properties of IV with a poor instrumental variable (for k=2),
plimˆβOLS=β1+Corr(x,u)σuσxplimˆβIV=β1+Corr(z,u)Corr(z,x)σuσx
IV worse than OLS if Corr(z,u)Corr(z,x)>Corr(x,u)
IV estimation in the multiple regression model (k>2),
y=β0+β1x1⏟Endog+β2x2⏟Exog+u
Two Stage Least Squares (2SLS)
Stage one (reduced form):
Estimate, x1=γ0+γ1z+γ2x2+ϵ , to get ˆx1
Stage two:
Estimate, y=β0+β1ˆx1+β2x2+u
data("card")
reg.ls <-lm (lwage~educ + exper+expersq+black+smsa+south+smsa66+reg662+reg663+reg664+reg665+reg666+reg667+reg668+reg669, data = card)
reg.iv <-ivreg(lwage~educ + exper+expersq+black+smsa+south+smsa66+reg662+reg663+reg664+reg665+reg666+reg667+reg668+reg669|
nearc4 + exper+expersq+black+smsa+south+smsa66+reg662+reg663+reg664+reg665+reg666+reg667+reg668+reg669, data = card)
lwage | ||
OLS | IV | |
educ | 0.075*** | 0.132** |
(0.003) | (0.055) | |
exper | 0.085*** | 0.108*** |
(0.007) | (0.024) | |
expersq | -0.002*** | -0.002*** |
(0.0003) | (0.0003) | |
Observations | 3,010 | 3,010 |
Note: | p<0.1; p<0.05; p<0.01 |
Testing for endogeneity of x1 in
y=β0+β1x1+β2x2+u
Reduced form, x1=γ0+γ1z+γ2x2+ϵ
x1 is exogenous if and only if ϵ is correlated with u
Test equation
y=β0+β1x1+β2x2+δˆϵ+e
H0, exogeneity of x1, is rejected if δ is significantly different from zero.
Motivating Examples
RDD exploits exogeneity in program characteristics (designing aspects).
We could think of it as a randomized experiment at cutoff \ Treatment is assigned based on a cutoff (or running variable).
Intuition: Participants in the program are very different, however, at the margin (cutoff), those just at the threshold are virtually identical.
For instance,
(Animation) Source: Nick C. Huntington-Klein. Twitter @nickchk
library(foreign); library(ggplot2)
RDData <- read.dta("http://masteringmetrics.com/wp-content/uploads/2015/01/AEJfigs.dta"); RDData$over21 <- (RDData$agecell>=21); attach(RDData);
ggplot(RDData,aes(x=agecell, y=all,colour=over21))+geom_point()+ylim(80,110)+stat_smooth(method=loess)+labs(x="Age",y="Mortality Rate (per 100,000)")
Model
y=β0+β1Dtreatment+β2Running+u
Final comments,