1 Introduction of R

I started to use R to generate figures for the project data. For example, I developed ViewBS to visualize whole genome bisulfite sequencing data. In ViewBS, I use R

Learning objectivesLearning Objectives

✓ Become comfortable with RStudio (a graphical interface for R)

✓ Fluently interact with R using RStudio

✓ Become familiar with R syntax

✓ Understand data structures in R

✓ Inspect and manipulate data structures

✓ Install packages and use functions in R

✓ Visualize data using simple and complex plotting methods

1.1 Setup

1.2 Basic systax

1.2.1 Install packages

1.2.2 Install

1.2.2.1 Getting help on functions and packages

1.3 Data type

1.4 Data structures

R has multiple data structures. If you are familiar with excel you can think of data structures as building blocks of a table and the table itself, and a table is similar to a sheet in excel. Most of the time you will deal with tabular data sets, you will manipulate them, take sub-sections of them. It is essential to know what are the common data structures in R and how they can be used. R deals with named data structures, this means you can give names to data structures and manipulate or operate on them using those names.

1.5 Read from and write into files

1.5.1 Read Random Rows from A Huge File

Given R data frames stored in the memory, sometimes it is beneficial to sample and examine the data in a large-size csv file before importing into the data frame. To the best of my knowledge, there is no off-shelf R function performing such data sampling with a relatively low computing cost. Therefore, I drafted two utility functions serving this particular purpose, one with the LaF library and the other with the reticulate library by leveraging the power of Python. While the first function is more efficient and samples 3 records out of 336,776 in about 100 milliseconds, the second one is more for fun and a showcase of the reticulate package.

#install.packages("LaF")
library(LaF)
sample1 <- function(file, n) {
  lf <- laf_open(detect_dm_csv(file, sep = ",", header = TRUE, factor_fraction = -1))
  return(read_lines(lf, sample(1:nrow(lf), n)))
}

sample1("data/galton.csv", 3)

##   family father mother sex height nkids
## 1     32     72   62.0   F   63.5     5
## 2     42     71   65.5   M   69.0     6
## 3    158     68   59.0   F   61.0    10

1.5.2 Reference

Read Random Rows from A Huge CSV File: https://www.r-bloggers.com/read-random-rows-from-a-huge-csv-file/

1.6 Functions

1.7 Control structure

1.8 Reference

Introduction to R: https://wiki.harvard.edu/confluence/display/hbctraining/Introduction+to+R

R graphics with ggplot2 workshop notes: http://tutorials.iq.harvard.edu/R/Rgraphics/Rgraphics.html