Category:Genetic Algorithms

Genetic Algorithms are a particular type of random search heuristic which mirror the process of Natural Selection. Possible solutions are encoded as vectors of values known as "genomes". These vectors typically contain either boolean values, integers, or floating point numbers, and mimic the DNA of animals in nature. In theory, animals who perform better will be more likely to reproduce. Their performance is based highly on their DNA, which is then passed on to their children, in combination with the DNA of their mate.

This process can be simulated in a computer, with hundreds of genomes, and thousands of generations. A few illustrative examples are shown here:

https://www.youtube.com/watch?v=z9ptOeByLA4

https://www.youtube.com/watch?v=HgWQ-gPIvt4

http://vimeo.com/79098420

Adaptive Genetic Algorithms
In the standard genetic algorithms, the number of hypotheses replaced by crossover (some algorithms assign each hypothesis a probability instead) or the probability of mutation occurring are constant throughout the entire run of the algorithm. Those two values however have a very substantial impact on the accuracy and the convergence speed of the algorithm. Varying the two properties over many experiments is one way standard genetic algorithms are used to solve a problem.

There exists a variant of the standard genetic algorithm called Adaptive Genetic Algorithms. In this type of genetic algorithm, the probabilities of crossover and mutation are modified after each generation. The modification is made based off of metrics that describe the fitness of the current hypothesis. Hypotheses that have high fitness hypotheses are protected (the most fit hypothesis is often carried over untouched), while hypotheses with sub-average fitness are disrupted. In other words, as the fitness of a hypothesis increased the chance of that hypothesis being affected by randomness is reduced by decreasing the probability of crossovers and/or mutations. Varying the chances based on each hypothesis’ fitness not only improves the convergence rate of the GA, but also prevents the GA from getting stuck at a local optimum because sub-average solutions are disrupted and resulting in many new hypotheses being created each generation. Some adaptive algorithms will completely disrupt sub-average hypotheses by setting the change they are used in a crossover and/or mutated to 100 percent.

Other adaptive algorithms will use metrics that look at properties of the whole population in addition to individual hypotheses. One common form of this kind of metric involves using clustering and then varying the crossover and mutation rates based off of the size of the clusters in the population. This can help make sure that the hypotheses in the population remain diverse and therefore avoid local optima in the fitness function.

In studies, adaptive algorithms have been shown to improve the accuracy and the convergence speed over standard genetic algorithms. Some studies also suggest that dynamically varying the population size might also have a beneficial effect on a genetic algorithms performance. However, none of the studies I looked at, examined that specific aspect.
 * http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=286385
 * http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4220690
 * http://www.ijcai.org/papers07/Papers/IJCAI07-375.pdf

Package: GA
Description

The package is used to run Genetic Algorithms (GA) with any fitness function as well as allow several parameters of the GA to be selected

Package PDF Link

http://cran.r-project.org/web/packages/GA/GA.pdf

Primary Function

ga(type = c("binary", "real-valued", "permutation"),fitness, ...,min, max, nBits,population = gaControl(type)$population,selection = gaControl(type)$selection,crossover = gaControl(type)$crossover,mutation = gaControl(type)$mutation,popSize = 50,pcrossover = 0.8,pmutation = 0.1,elitism = base::max(1, round(popSize*0.05)),maxiter = 100,run = maxiter,maxfitness = Inf,names = NULL,suggestions = NULL,keepBest = FALSE,parallel = FALSE, monitor = gaMonitor,seed = NULL)

General Warnings: Playing with any of the parameters takes considerable time, so start testing early! (some runs took >10 hours)

To see an explanation of how a particular function for a parameter works there is no documentation. Simply take the function and push enter on the R command line, which will provide the details on the function. Example: "ga_nlrSelection"

 Explanation of Common Parameters 

type: How the genetic algorithm should run depending on the type of target attribute

"binary" for binary representations of decision variables;

"real-valued" for optimization problems where the decision variables are floating-poin

"permutation" for problems that involves reordering of a list

fitness: the fitness function, any allowable R function which takes as input an individual string representing a potential solution, and returns a    numerical value describing

min & max: Vectors of length equal to the decision variables for the minimum or maximum search space

population: an R function for randomly generating an initial population.

"gabin_Population"

selection: an R function performing selection which generates a new population of individuals from the current population probabilistically according to individual fitness.

"ga_nlrSelection"

crossover: an R function performing crossover which forms offsprings by combining part of the genetic information from their parents.

"gabin_spCrossover"

mutation: an R function performing mutation which randomly alters the values of some genes in a parent chromosome.

"gareal_raMutation"

popSize: the population size.

pcrossover: the probability of crossover between pairs of chromosomes. Typically this large value and by default is set to 0.8.

pmutation: the probability of mutation in a parent chromosome. Typically occurs with a small probability and by default is set to 0.1.

elitism: the number of best fitness individuals to survive at each generation. By default: the top 5% individuals will survive at each iteration.

maxiter: the maximum number of iterations to run before the GA search is halted.

run: the number of consecutive generations without any improvement in the best fitness value before the GA is stopped.

maxfitness: the upper bound on the fitness function which if reached will halt the search

Package: Genalg
Genalg contains two primary genetic algorithm functions, one for binary genomes, and one for floating point. The floating point variant, rbga is shown here: rbga(stringMin=c, stringMax=c, suggestions=NULL, popSize=200, iters=100, mutationChance=NA, elitism=NA, monitorFunc=NULL, evalFunc=NULL, showSettings=FALSE, verbose=FALSE) stringMin- vector with minimum values for each gene.

stringMax- vector with maximum values for each gene.

suggestions- optional list of suggested chromosomes

popSize- the population size.

iters- the number of iterations.

mutationChance- the chance that a gene in the chromosome mutates. By default 1/(size+1). It affects the convergence rate and the probing of search space: a low chance results in quicker convergence, while a high chance increases the span of the search space.

elitism- the number of chromosomes that are kept into the next generation. By default is about 20% of the population size.

monitorFunc- Method run after each generation to allow monitoring of the optimization

evalFunc- User supplied method to calculate the evaluation function for the given chromosome

showSettings- if true the settings will be printed to screen. By default False.

verbose- if true the algorithm will be more verbose. By default False.

The monitor function is useful for gathering information about the GA as it runs. Here is an example function which plots the best individual over time, after each iteration: x <- 0 monitor <- function(obj) { # plot the population if(!is.null(dev.list["RStudioGD"]))dev.off if(x >= 1) plot(obj$best) x <<- x+1 } The 4th line, containing the "dev.off" is critical for those using RStudio. This clears the other plots before redrawing this plot. In a simulation with thousands of generations, having thousands of plots in memory while running would slow the machine down significantly.

Package: ANN
See the Neural Networks page.