The best answers to the question “Drop data frame columns by name” in the category Dev.
I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:
df$x <- NULL
But I was hoping to do this with fewer commands.
Also, I know that I could drop columns using integer indexing like this:
df <- df[ -c(1, 3:6, 12) ]
But I am concerned that the relative position of my variables may change.
Given how powerful R is, I figured there might be a better way than dropping each column one by one.
There’s also the
subset command, useful if you know which columns you want:
df <- data.frame(a = 1:10, b = 2:11, c = 3:12) df <- subset(df, select = c(a, c))
UPDATED after comment by @hadley: To drop columns a,c you could do:
df <- subset(df, select = -c(a, c))
You can use a simple list of names :
DF <- data.frame( x=1:10, y=10:1, z=rep(5,10), a=11:20 ) drops <- c("x","z") DF[ , !(names(DF) %in% drops)]
Or, alternatively, you can make a list of those to keep and refer to them by name :
keeps <- c("y", "a") DF[keeps]
For those still not acquainted with the
drop argument of the indexing function, if you want to keep one column as a data frame, you do:
keeps <- "y" DF[ , keeps, drop = FALSE]
drop=TRUE (or not mentioning it) will drop unnecessary dimensions, and hence return a vector with the values of column
You could use
%in% like this:
df[, !(colnames(df) %in% c("x","bar","foo"))]
is probably easiest, or for multiple variables:
within(df, rm(x, y))
Or if you’re dealing with
data.tables (per How do you delete a column by name in data.table?):
dt[, x := NULL] # Deletes column x by reference instantly. dt[, !"x"] # Selects all but x into a new data.table.
or for multiple variables
dt[, c("x","y") := NULL] dt[, !c("x", "y")]