typeof returns the storage type of a given object. There are four main types
double (8 bytes). Unfortunately, there is no float type in R - this means that databases use systematically more memory in R than they do to Stata.
integers (4 bytes - long in Stata). While Stata automatically creates a variable of type integer when possible, the suffix "L" is required in R.
typeof(1) #>  "double" typeof(1L) #>  "integer"
logical (TRUE and FALSE)
typeof(TRUE) #>  "logical"
characters (strings in Stata). To alternate between characters and numerics, you may use the functions
Factor variables are variables that look like characters, but that are actually of type integer. They correspond exactly to integers with a value label in Stata. You may obtain a factor variable instead of a string when using
read.csv (and not
fread), or when using
You may need factor variables when you need R to treat a numeric as a categorical variable (for instance in regressions) or when you need to sort strings by something other than the alphabetic order (for instance in
ggplot2). While base R tends to convert all strings to factor variables,
data.table avoid doing so.
To convert a factor into a character, just use
The R command
type.convert(, as.is = TRUE) automatically chooses the appropriate type of a column in a dataset. It corresponds to the Stata command
R uses Garbage Collection: an object is automatically deleted when no name points to it anymore. For instance, when executing the command
df <- f(df), first a new dataset
f(df) is created, and then the previous version of
df is dropped from memory.
You can explicitly delete a particular object with the function
rm(). You can delete everything with
rm(list = ls()) - similar to Stata
One can modify a vector or a list in R in place
x <- c(1L, 2L) .Internal(inspect(x)) #> @7f877821fec8 x <- 2L .Internal(inspect(x)) #> @7f877821fec8
A deep copy is only made when multiple names point to the same original object:
y <- x x <- 3L #> @7f877821d188 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 3,2
Subsetting an element of a list does not require additional memory : elements are shallow copied. You can check it by using the function
object_size in the package
pryr, which gives the cumulative size occupied by two objects
library(pryr) x <- list(1, 2) y <- x[] object_size(x) #> 152 B object_size(x, y) #> 152 B
However, subsetting a vector does a deep copy of these elements:
x <- c(1, 2) y <- x object_size(x) #> 56 B object_size(x, y) #> 104 B
Finally, converting an object between a list to a vector does a deep copy.
x <- c(1, 2) y <- as.list(x) object_size(x) #> 56 B object_size(x, y) #> 208 B
A data.frame is internally a list of vectors (i.e. a list of columns of potentially different types).
Because subsetting lists does not require additional memory, subsetting a data.frame by columns does not require additionaly memory (starting from R 3.1.0)
library(pryr) N <- 1e3 DF <- data.frame( id =sample(round(runif(100, max = 100), 4), N, TRUE), v = sample(round(runif(100, max = 100), 4), N, TRUE) ) DF1 <- DF %>% select(id) object_size(DF) #>  16.8 kB object_size(DF1) #>  16.8 kB
For the same reason, modifying a column in a data.frame only needs additional memory for one column
DF2 <- DF %>% mutate(v = mean(v)) pryr::object_size(DF, DF2) #>  25.2 kB
Since subsetting a vector creates a deep copy, subsetting a data.frame by row creates a deep copy
DF3 <- DF[1:(5e2)] object_size(DF, DF3) #> 25.1 kB
For the same reason, merging or appending two datasets requires memory for the master, using, and the merged dataset.
object_size(DF, DF2, merge(DF, DF2)) #> 34.2 kB object_size(DF, DF2, rbind(DF, DF2)) #> 49.5 kB
Converting an object between a data.frame to a list of vectors does not create deep copies. However, converting a data.frame to a matrix requires a deep copy.
DF4 <- as.list(DF) object_size(DF) #> 16.9 kB object_size(DF, DF4) #> 16.9 kB DF5 <- as.matrix(DF) object_size(DF, DF5) #> 34.2 kB