13 Data wrangling

13.1

df[df == 0] <- NA

data<-replace(data.frame(lapply(data, as.character), stringsAsFactors = FALSE), !is.na(data), “1”)

13.2 Joining Data in R with dplyr

13.2.1 Whats Covered

  • Mutating joins
  • Filtering joins and set operations
  • Assembling data
  • Advanced joining
  • Case Study

Keys

  • The Primary key needs to be unique in a table
  • The foreign key in the second table can be duplicated
  • second table will be matched to the primary table based on the primary key
  • The primary key may be one, two or even more columns in the table
#install.packages("dplyr")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:Hmisc':
## 
##     src, summarize
## The following object is masked from 'package:MASS':
## 
##     select
## The following objects are masked from 'package:xts':
## 
##     first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union