R for Stata users

R has four main storage types

The function typeof returns the storage type of a given object. There are four main types

The R command type.convert(, as.is = TRUE) automatically chooses the appropriate type of a column in a dataset. It corresponds to the Stata command compress.


Garbage Memory

R uses Garbage Collection: an object is automatically deleted when no name points to it anymore. For instance, when executing the command df <- f(df), first a new dataset f(df) is created, and then the previous version of df is dropped from memory.

You can explicitly delete a particular object with the function rm(). You can delete everything with rm(list = ls()) - similar to Stata clear.


Lists and vectors


Data.frames

A data.frame is internally a list of vectors (i.e. a list of columns of potentially different types).

Because subsetting lists does not require additional memory, subsetting a data.frame by columns does not require additionaly memory (starting from R 3.1.0)

library(pryr)
N <- 1e3
DF <- data.frame(
  id =sample(round(runif(100, max = 100), 4), N, TRUE),   
  v =  sample(round(runif(100, max = 100), 4), N, TRUE) 
)
DF1 <- DF %>% select(id)
object_size(DF)
#> [1] 16.8 kB
object_size(DF1)
#> [1] 16.8 kB

For the same reason, modifying a column in a data.frame only needs additional memory for one column

DF2 <- DF %>% mutate(v = mean(v))
pryr::object_size(DF, DF2)
#> [1] 25.2 kB

Since subsetting a vector creates a deep copy, subsetting a data.frame by row creates a deep copy

DF3 <- DF[1:(5e2)]
object_size(DF, DF3)
#> 25.1 kB

For the same reason, merging or appending two datasets requires memory for the master, using, and the merged dataset.

object_size(DF, DF2, merge(DF, DF2))
#> 34.2 kB
object_size(DF, DF2, rbind(DF, DF2))
#> 49.5 kB

Converting an object between a data.frame to a list of vectors does not create deep copies. However, converting a data.frame to a matrix requires a deep copy.

DF4 <- as.list(DF)
object_size(DF)
#> 16.9 kB
object_size(DF, DF4)
#> 16.9 kB
DF5 <- as.matrix(DF)
object_size(DF, DF5)
#> 34.2 kB