7 MRG
Sir Francis Galton (1822–1911) was an English statistician. He founded many concepts in statistics, such as correlation, quartile, percentile and regression, that are still being used today.
7.1 Read the data
Consider the data collected by Francis Galton in the 1880s, stored in a modern format in the galton.csv
file. In this file, heights
is the variable containing the child’s heights, while the father
’s and mother
’s height is contained in the variables father and mother.
The family variable is a numerical code identifying children in the same family; the number of kids in this family is in nkids
.
## Data from https://github.com/thomas-haslwanter/statsintro_python/blob/master/ISP/Code_Quantlets/08_TestsMeanValues/anovaOneway/galton.csv
tab<-read.csv("data/galton.csv")
head(tab)
## family father mother sex height nkids
## 1 1 78.5 67.0 M 73.2 4
## 2 1 78.5 67.0 F 69.2 4
## 3 1 78.5 67.0 F 69.0 4
## 4 1 78.5 67.0 F 69.0 4
## 5 2 75.5 66.5 M 73.5 4
## 6 2 75.5 66.5 M 72.5 4
Check the number of rows and columns:
dim(tab)
## [1] 898 6
Covert the column of sex
into numberic values:
tab$sex=as.numeric(tab$sex) -1
head(tab)
## family father mother sex height nkids
## 1 1 78.5 67.0 1 73.2 4
## 2 1 78.5 67.0 0 69.2 4
## 3 1 78.5 67.0 0 69.0 4
## 4 1 78.5 67.0 0 69.0 4
## 5 2 75.5 66.5 1 73.5 4
## 6 2 75.5 66.5 1 72.5 4
Remove the columns of nkids
:
tab<-tab[, -c(6)]
head(tab)
## family father mother sex height
## 1 1 78.5 67.0 1 73.2
## 2 1 78.5 67.0 0 69.2
## 3 1 78.5 67.0 0 69.0
## 4 1 78.5 67.0 0 69.0
## 5 2 75.5 66.5 1 73.5
## 6 2 75.5 66.5 1 72.5