Join and reshape | R for Stata Users

Join datasets

The packages dplyr uses the SQL concepts of left, right, full, inner, semi, anti and cross joins. The following table gives the correspondence between the Stata commands merge and joinby with the dplyr functions:

Stata	merge v1, keep(master matched using)
dplyr	full_join(df1, df2, by = "v1")

Stata	merge v1, keep(master matched)
dplyr	left_join(df1, df2, by = "v1")

Stata	merge v1, keep(matched using)
dplyr	right_join(df1, df2, by = "v1")

Stata	merge v1, keep(matched)
dplyr	inner_join(df1, df2, by = "v1")

Stata	merge v1, keep(matched) keepusing(v1)
dplyr	semi_join(df1, df2, by = "v1")

Stata	merge v1, keep(master) keepusing(v1)
dplyr	anti_join(df1, df2)

Stata	crossby
dplyr	cross_join(df1, df2)

Wen there are multiple matches both in the master and using datasets, these functions create all combinations of rows, similarly to Stata joinby (and contrary to Stata merge m:m).

Append datasets

To append one dataset to another:

Stata	append using "using.dta"
dplyr	row_binds(df1, df2)

When string columns are append too numeric columns, Stata returns an error (and with the option force, it replaces the using dataset to missing values). In contrast, in R, columns with different types are coerced to their higher type (logical < strings < int < real < character).

Reshape datasets

The package tidyr includes the functions pivot_longer and pivot_wider that alternate between wide and long datasets.

From wide to long

dfwide	–>	dflong
id stub1 stub2 1 4.1 4.5 2 4.5 3.0	–>	id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0

Stata	reshape long stub, i(id) j(variable) string
tidyr	pivot_longer(dfwide, starts_with("stub"), names_to = "variable", values_to = "value")

The names in red specify the name of new variables. While reshape long requires to specify both the variable to reshape and the key variables, picot_longer only requires to specify the variables to reshape - the key variable are inferred to be the remaining ones.

From long to wide

dflong	–>	dfwide
id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0	–>	id stub1 stub2 1 4.1 4.5 2 4.5 3.0

Stata	reshape wide value, i(i) j(variable) string
tidyr	pivot_wider(dflong, names_from = variable, values_from = value)

R for Stata users

Join datasets

Append datasets

Reshape datasets