Category:Basic R tips

1.) How to check if a variable exists in R data frame? For example, if we load all the 100 stocks and wanted to find out if "BIIB" (Biogen) ticker existed.

There are several ways to do this:
> stock_data <- read.csv("stock-data.csv") > "BIIB" %in% names(stock_data) # returns TRUE if it exists, else FALSE > any(names(stock_data) == "BIIB") # returns TRUE if it exits, else FALSE > attach(stock_data) followed by exists("BIIB") # returns TRUE if exists, else FALSE

2.) How to find out what exists in what scope? Note, variables can exists in function scope (local) or global scope.

Use ls, the list command, to get the list of all variables in the scope. > ls # default listing of all variables > ls(globalenv) # list of variables in global environment > ls (envir = globalenv, pattern="[functionName]") # using regular expression for function scope

3.) How to remove a variable from an environment?

Use rm, the remove command, to remove one or more variables. > rm # remove existing variables > rm(list = ls) # to remove all the variables

4.) How to redirect input and output from and to a file?  This is very useful for setting up long running experiments.

INPUT:  Use the source function to load a script from the current working directory > source("scriptFilename") OUTPUT:
 * 1) if you don't specify the path, scriptFilename should exists in current working directory

Use the sink function to define the direction of the output > sink("myFile") # there is append and split option too, if required > sink # return output to the terminal EXAMPLES: > sink("c:/machine-learning/projects/output.txt") > sink("c:/machine-learning/projects/output.txt", append=TRUE, split=TRUE) GRAPH: > jpeg("c:/machine-learning/projects/corr-plot.jpg") > plot(x) > dev.off # don't forget to call dev.off to return ouptut back to terminal
 * 1) direct the output to a file

5.) How to clear R console?

> Ctrl + L

6.) How to install a new package in R?

> install.package("package_name") # install a package and its dependencies automatically Don't forget to call "library(package_name)" before use!

7.) Some useful how-to's in R:

> notMissing <- data[data$myAttribute != "?"] # filter by non-missing entries

8) ' Use Index operation to extract features/variables from data instead of other methods'

 * Data[,3] or Data[,c(1:4)] -This is the best way 
 * Data$column_nameThis is not the best way 
 * data = Data[c("v1", "v2", "v3")]This is not the best way

9) How to calculate the execution time of any function?
system.time (function ) 10) How to write a formula in R? > names <- names(trainData) # You can specify the names of the attributes using "names" function

> Formula <- as.formula(paste("TargetAttribute1 + TargetAttribute2 + ... + TargetAttributesN ~", paste(names[!names %in% c("TargetAttribute1","TargetAttribute2",...,"TargetAttributesN")], collapse = " + "))) 11) Correlation between pairs of attributes: > corMatrix <- cor(data, data) Plot the correlation as a matrix with correlation values: > corrplot(corMatrix, method = "number") 12) Split the data into train and test set (70%): > size <- (dim(data)[1]) * 0.7 > sampledData <- sample(nrow(data), size) > trainData <- data[sampledData,]; > x <- 1:(dim(data)[1]) > remain <- setdiff(x,sampledData) > testData <- data[remain,]; 13) Changing the target class with N values to N targets with 0 and 1 values (Specially used to convert one attribute with N values to N attributes with 0 and 1 values in Neural Nets): > data <- cbind(data[,1:dim(data)[2]-1],((data[,dim(data)[2]] == "value1")+0),((data[,dim(data)[2]] == "value2")+0),. . ., ((data[,dim(data)[2]] == "valueN")+0)) # Assuming that the target attribute is the last attribute

14) Changing a specific value (in this example 1) of a specific attribute to another value (in this example 0.9):
> data$selectedAttribute[data$selectedAttribute == 1] <- 0.9

15) How to know the time taken for individual R Objects in your code? ( below is the usage of it)
Rprof x=runif(1000000) for(iin 1:1000000) { a=0 b=0 a=cbind(x[i],a) b=rbind(x[i],b) } Rprof(NULL) summaryRprof

16) Vector datatypes speeds up the operation than loops ( below is the example to illustrate)
x=runif(100000000) y=runif(100000000) z=vector(length=100000000) system.time(z<-x+y) #Takes 0.36s system.time(for(i in 1:length(x)) z[i]<-x[i]+y[i]) #Takes 211.8s 17) Deterministic Sampling (fixed random seed) > set.seed(1) > size <- (dim(data)[1]) * 0.7 > sampledData <- sample(nrow(data), size)

18) How to write an object into a file?
> dput(object, file = "results.txt",     control = c("keepNA", "keepInteger", "showAttributes"))

19) How to write in the console? (You can also use %g, %f, etc for other formats.)
> sprintf("Features: %s", featureNames) 20) Functional programming: Filter > nonMissing <- data[(data$myAttribute != "?")] # Filter by non-missing entries

21) Functional Programming: Map (conditionally replace)
> data$attribute[data$attribute == "/"] <-newValue # replace all missing values with newValue 22) Functional Programming: Lambda >(function (x) x+2)(1) # make an anonymous function that calculates it's input plus 2, apply function to input 1 > [1] 3 # 1+2 = 3 23) Running R script from command line > Rscript scriptName.r