7 MRG

Sir Francis Galton (1822–1911) was an English statistician. He founded many concepts in statistics, such as correlation, quartile, percentile and regression, that are still being used today.

7.1 Read the data

Consider the data collected by Francis Galton in the 1880s, stored in a modern format in the galton.csv file. In this file, heights is the variable containing the child’s heights, while the father’s and mother’s height is contained in the variables father and mother.

The family variable is a numerical code identifying children in the same family; the number of kids in this family is in nkids.

## Data from  https://github.com/thomas-haslwanter/statsintro_python/blob/master/ISP/Code_Quantlets/08_TestsMeanValues/anovaOneway/galton.csv
tab<-read.csv("data/galton.csv")
head(tab)
##   family father mother sex height nkids
## 1      1   78.5   67.0   M   73.2     4
## 2      1   78.5   67.0   F   69.2     4
## 3      1   78.5   67.0   F   69.0     4
## 4      1   78.5   67.0   F   69.0     4
## 5      2   75.5   66.5   M   73.5     4
## 6      2   75.5   66.5   M   72.5     4

Check the number of rows and columns:

dim(tab)
## [1] 898   6

Covert the column of sex into numberic values:

tab$sex=as.numeric(tab$sex) -1 
head(tab)
##   family father mother sex height nkids
## 1      1   78.5   67.0   1   73.2     4
## 2      1   78.5   67.0   0   69.2     4
## 3      1   78.5   67.0   0   69.0     4
## 4      1   78.5   67.0   0   69.0     4
## 5      2   75.5   66.5   1   73.5     4
## 6      2   75.5   66.5   1   72.5     4

Remove the columns of nkids:

tab<-tab[, -c(6)]
head(tab)
##   family father mother sex height
## 1      1   78.5   67.0   1   73.2
## 2      1   78.5   67.0   0   69.2
## 3      1   78.5   67.0   0   69.0
## 4      1   78.5   67.0   0   69.0
## 5      2   75.5   66.5   1   73.5
## 6      2   75.5   66.5   1   72.5