Chapter 3 Lecture 05
3.1 Elemental confounds
The four elemental confounds
3.2 The fork
\(X \leftarrow Z \rightarrow Y\)
library(simcausal)
<- 1000
n ## confounding variable
## Fair coin flip
<- rbern(n, 0.5)
Z <- rbern(n, (1-Z)*0.1 + Z*0.9)
X <- rbern(n, (1-Z)*0.1 + Z*0.9) Y
When \(Z\) equals 0, \(X\) and \(Y\) has 0.1 probabilty to be 1; When \(Z\) equals 1, \(X\) and \(Y\) has 0.9 probabilty to be 1;
table(X, Y)
## Y
## X 0 1
## 0 414 95
## 1 89 402
\[Y \not\!\perp\!\!\!\perp X\] Y is not independent on X because there is a common cause.
cor(X[], Y)
## [1] 0.6320058
When we stratify \(Z\), the diagonals are not bigger as proportion as the off diagonals.
\[Y \not\!\perp\!\!\!\perp X \mid Z\]
cor(X[Z==0], Y[Z==0])
## [1] 0.01552259
cor(X[Z==1], Y[Z==1])
## [1] -0.02201733
library(simcausal)
= c(4, 2)
cols = 300
N <- rbern(N, prob = 0.5)
Z <- rnorm(N, 2*Z-1)
X <- rnorm(N, 2*Z-1)
Y
plot(X, Y, col= cols[Z+1], lwd = 3)
abline(lm(Y[Z==1]~X[Z==1]), col=2, lwd=3)
abline(lm(Y[Z==0]~X[Z==0]), col=2, lwd=3)
abline(lm(Y~X), lwd=3)
text(x = -2, y=2, labels = "Z = 0", col = 4)
We have a continouse variable \(X\) and \(Y\). For all \(X\) and \(Y\), there is a positive correlation.
X and Y are not independent on one another.
3.2.1 Fork example
Why do regions of the US with higher rates of marriage also have higher rates of divorce?
library(rethinking)
## Loading required package: rstan
## Loading required package: StanHeaders
## Loading required package: ggplot2
## rstan (Version 2.21.3, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## Loading required package: cmdstanr
## This is cmdstanr version 0.4.0
## - Online documentation and vignettes at mc-stan.org/cmdstanr
## - Use set_cmdstan_path() to set the path to CmdStan
## - Use install_cmdstan() to install CmdStan
## Loading required package: parallel
## rethinking (Version 2.21)
##
## Attaching package: 'rethinking'
## The following object is masked from 'package:rstan':
##
## stan
## The following objects are masked from 'package:simcausal':
##
## rbern, sim
## The following object is masked from 'package:stats':
##
## rstudent
data("WaffleDivorce")
WaffleDivorce
## Location Loc Population MedianAgeMarriage Marriage Marriage.SE
## 1 Alabama AL 4.78 25.3 20.2 1.27
## 2 Alaska AK 0.71 25.2 26.0 2.93
## 3 Arizona AZ 6.33 25.8 20.3 0.98
## 4 Arkansas AR 2.92 24.3 26.4 1.70
## 5 California CA 37.25 26.8 19.1 0.39
## 6 Colorado CO 5.03 25.7 23.5 1.24
## 7 Connecticut CT 3.57 27.6 17.1 1.06
## 8 Delaware DE 0.90 26.6 23.1 2.89
## 9 District of Columbia DC 0.60 29.7 17.7 2.53
## 10 Florida FL 18.80 26.4 17.0 0.58
## 11 Georgia GA 9.69 25.9 22.1 0.81
## 12 Hawaii HI 1.36 26.9 24.9 2.54
## 13 Idaho ID 1.57 23.2 25.8 1.84
## 14 Illinois IL 12.83 27.0 17.9 0.58
## 15 Indiana IN 6.48 25.7 19.8 0.81
## 16 Iowa IA 3.05 25.4 21.5 1.46
## 17 Kansas KS 2.85 25.0 22.1 1.48
## 18 Kentucky KY 4.34 24.8 22.2 1.11
## 19 Louisiana LA 4.53 25.9 20.6 1.19
## 20 Maine ME 1.33 26.4 13.5 1.40
## 21 Maryland MD 5.77 27.3 18.3 1.02
## 22 Massachusetts MA 6.55 28.5 15.8 0.70
## 23 Michigan MI 9.88 26.4 16.5 0.69
## 24 Minnesota MN 5.30 26.3 15.3 0.77
## 25 Mississippi MS 2.97 25.8 19.3 1.54
## 26 Missouri MO 5.99 25.6 18.6 0.81
## 27 Montana MT 0.99 25.7 18.5 2.31
## 28 Nebraska NE 1.83 25.4 19.6 1.44
## 29 New Hampshire NH 1.32 26.8 16.7 1.76
## 30 New Jersey NJ 8.79 27.7 14.8 0.59
## 31 New Mexico NM 2.06 25.8 20.4 1.90
## 32 New York NY 19.38 28.4 16.8 0.47
## 33 North Carolina NC 9.54 25.7 20.4 0.98
## 34 North Dakota ND 0.67 25.3 26.7 2.93
## 35 Ohio OH 11.54 26.3 16.9 0.61
## 36 Oklahoma OK 3.75 24.4 23.8 1.29
## 37 Oregon OR 3.83 26.0 18.9 1.10
## 38 Pennsylvania PA 12.70 27.1 15.5 0.48
## 39 Rhode Island RI 1.05 28.2 15.0 2.11
## 40 South Carolina SC 4.63 26.4 18.1 1.18
## 41 South Dakota SD 0.81 25.6 20.1 2.64
## 42 Tennessee TN 6.35 25.2 19.4 0.85
## 43 Texas TX 25.15 25.2 21.5 0.61
## 44 Utah UT 2.76 23.3 29.6 1.77
## 45 Vermont VT 0.63 26.9 16.4 2.40
## 46 Virginia VA 8.00 26.4 20.5 0.83
## 47 Washington WA 6.72 25.9 21.4 1.00
## 48 West Virginia WV 1.85 25.0 22.2 1.69
## 49 Wisconsin WI 5.69 26.3 17.2 0.79
## 50 Wyoming WY 0.56 24.2 30.7 3.92
## Divorce Divorce.SE WaffleHouses South Slaves1860 Population1860
## 1 12.7 0.79 128 1 435080 964201
## 2 12.5 2.05 0 0 0 0
## 3 10.8 0.74 18 0 0 0
## 4 13.5 1.22 41 1 111115 435450
## 5 8.0 0.24 0 0 0 379994
## 6 11.6 0.94 11 0 0 34277
## 7 6.7 0.77 0 0 0 460147
## 8 8.9 1.39 3 0 1798 112216
## 9 6.3 1.89 0 0 0 75080
## 10 8.5 0.32 133 1 61745 140424
## 11 11.5 0.58 381 1 462198 1057286
## 12 8.3 1.27 0 0 0 0
## 13 7.7 1.05 0 0 0 0
## 14 8.0 0.45 2 0 0 1711951
## 15 11.0 0.63 17 0 0 1350428
## 16 10.2 0.91 0 0 0 674913
## 17 10.6 1.09 6 0 2 107206
## 18 12.6 0.75 64 1 225483 1155684
## 19 11.0 0.89 66 1 331726 708002
## 20 13.0 1.48 0 0 0 628279
## 21 8.8 0.69 11 0 87189 687049
## 22 7.8 0.52 0 0 0 1231066
## 23 9.2 0.53 0 0 0 749113
## 24 7.4 0.60 0 0 0 172023
## 25 11.1 1.01 72 1 436631 791305
## 26 9.5 0.67 39 1 114931 1182012
## 27 9.1 1.71 0 0 0 0
## 28 8.8 0.94 0 0 15 28841
## 29 10.1 1.61 0 0 0 326073
## 30 6.1 0.46 0 0 18 672035
## 31 10.2 1.11 2 0 0 93516
## 32 6.6 0.31 0 0 0 3880735
## 33 9.9 0.48 142 1 331059 992622
## 34 8.0 1.44 0 0 0 0
## 35 9.5 0.45 64 0 0 2339511
## 36 12.8 1.01 16 0 0 0
## 37 10.4 0.80 0 0 0 52465
## 38 7.7 0.43 11 0 0 2906215
## 39 9.4 1.79 0 0 0 174620
## 40 8.1 0.70 144 1 402406 703708
## 41 10.9 2.50 0 0 0 4837
## 42 11.4 0.75 103 1 275719 1109801
## 43 10.0 0.35 99 1 182566 604215
## 44 10.2 0.93 0 0 0 40273
## 45 9.6 1.87 0 0 0 315098
## 46 8.9 0.52 40 1 490865 1219630
## 47 10.0 0.65 0 0 0 11594
## 48 10.9 1.34 4 1 18371 376688
## 49 8.3 0.57 0 0 0 775881
## 50 10.3 1.90 0 0 0 0
## PropSlaves1860
## 1 4.5e-01
## 2 0.0e+00
## 3 0.0e+00
## 4 2.6e-01
## 5 0.0e+00
## 6 0.0e+00
## 7 0.0e+00
## 8 1.6e-02
## 9 0.0e+00
## 10 4.4e-01
## 11 4.4e-01
## 12 0.0e+00
## 13 0.0e+00
## 14 0.0e+00
## 15 0.0e+00
## 16 0.0e+00
## 17 1.9e-05
## 18 0.0e+00
## 19 4.7e-01
## 20 0.0e+00
## 21 1.3e-01
## 22 0.0e+00
## 23 0.0e+00
## 24 0.0e+00
## 25 5.5e-01
## 26 9.7e-02
## 27 0.0e+00
## 28 5.2e-04
## 29 0.0e+00
## 30 2.7e-05
## 31 0.0e+00
## 32 0.0e+00
## 33 3.3e-01
## 34 0.0e+00
## 35 0.0e+00
## 36 0.0e+00
## 37 0.0e+00
## 38 0.0e+00
## 39 0.0e+00
## 40 5.7e-01
## 41 0.0e+00
## 42 2.0e-01
## 43 3.0e-01
## 44 0.0e+00
## 45 0.0e+00
## 46 4.0e-01
## 47 0.0e+00
## 48 4.9e-02
## 49 0.0e+00
## 50 0.0e+00
\(M \leftarrow A \rightarrow D\)
3.2.2 What does it mean to stratify by a continouse variable
How does \(A\) influence \(D ?\)
What is \(D=f(A, M) ?\)
In a linear regression:
\[ \begin{aligned} D_{i} & \sim \operatorname{Normal}\left(\mu_{i}, \sigma\right) \\ \mu_{i} &=\alpha+\beta_{M} M_{i}+\beta_{A} A_{i} \end{aligned} \]
Every value of \(A\) produces of different relationship between \(D\) and \(M\).
From the perspective of marriage rate, marriage is just something that makes the intercept conditional on age.
OFten convenient to standardize variales in linear regression.
Standardize (Z-score): substract mean and divide by standard deviation.
Computation works better.
Easy to choose sensible priors.
3.2.3
\[ \begin{aligned} D_{i} & \sim \operatorname{Normal}\left(\mu_{i}, \sigma\right) \\ \mu_{i} &=\alpha+\beta_{M} M_{i}+\beta_{A} A \\ \alpha & \sim \operatorname{Normal}(0,10) \\ \beta_{M} & \sim \operatorname{Normal}(0,10) \\ \beta_{A} & \sim \operatorname{Normal}(0,10) \\ \sigma & \sim \operatorname{Exponential}(1) \end{aligned} \]
<- 20
n <- rnorm(n, 0, 10)
a <- rnorm(n, 0, 10)
bM <- rnorm(n, 0, 10)
bA plot(NULL, xlim=c(-2, 2), ylim=c(-2, 2),
xlab="Meian age of marriage (standardized)",
ylab="Divoce rate (stardardized)")
<- seq(from=-3, to=3, len=30)
Aseq for(i in 1:n){
<- a[i] + bA[i]*Aseq
mu lines(Aseq, mu, lwd=2, col=2)
}
<- 20
n <- rnorm(n, 0, 0.02)
a <- rnorm(n, 0,0.5)
bM <- rnorm(n, 0, 0.5)
bA plot(NULL, xlim=c(-2, 2), ylim=c(-2, 2),
xlab="Meian age of marriage (standardized)",
ylab="Divoce rate (stardardized)")
<- seq(from=-3, to=3, len=30)
Aseq for(i in 1:n){
<- a[i] + bA[i]*Aseq
mu lines(Aseq, mu, lwd=2, col=2)
}
3.2.3.1 Analyze the data
library(robustHD)
## Loading required package: perry
## Loading required package: robustbase
##
## Attaching package: 'robustHD'
## The following object is masked from 'package:rethinking':
##
## standardize