Pre-processing in R: Auto Generating Formulas For model Training

While most of the time the "class ~ ." will be sufficient for training a classification or regression model, some packages require that the user list all the variables in the forumla (i.e: neuralnet package).

This code solves the issue above:

...

# Generate the right side of the formula

Cols2 <- names(dataset[,num_columns - 1]) # all columns - class

Cols2 <- Cols2[! Cols2 %in% "Class"]

# Put together the formula

formula <- paste(paste("class~",paste(Cols2,collapse="+"))

...

If you are looking to generate a formula with the different pairwise combinations of a set of attributes, then you can use the following pieace of code to generate the combinations in a vector and use the paste function decribed above to generate a formula accordingly (in this example we are taking the pairwise product of all attributes, you can choose any other operator instead of multiplication):

# Start with original N=30 attributes 

str1 <- paste("x",1:30,sep="")

# Generate all possible pairwise combinations of the N=30 attributes

comb2 = combn(str1,2)

# Combine the pairwise combinations using multiplication operator across the columns 

str2 = apply(comb2, 2, function(x) paste(x,collapse="*"))

This piece of code can be extended to generate any m combinations of N attributes depending on the needs of the user.