Evolving Symbolic Regression

We have:

Our goal is to find an "approximation function" of the "target function" in symbolic form. We measure the quality of the approximation by the sum of the absolute differences of the "target function" y values and the "approximation function" y values at the sample x points. This number is also called the "fitness" of an approximating function.

Approximation
Histogram
Status
Resuls
Actions
Fuction Settings
Genetic Settings
Other Settings

Approximation

The Approximation window shows a cartesian diagram with several traces:
color trace explanation
red trace target function Its a connection of "fitness cases" (xi,yi), ideal trace we want to reach.
you can change this points in Function Setting dialog
black trace currently evaluated After starting the genetic program, a black trace shows the function of an individual that is currently evaluated
blue trace best approximation A blue trace shows the best approximation found so far.

Histogram

When a complete generation has been evaluated, the Adjusted Fitnesses window displays a histogram of the adjusted fitness values of all individuals. adjusted fitness = 1/(standardized fitness + 1)

Status

The Status window shows:

Results

The Results window displays:

Actions

Function Settings for the Program

You may generate new fitness for the program in the Function Settings dialog. Call it up with the Function Settings button. Settings cannot be changed while the genetic program is running. Therefore, the Function Settings button is enabled only when the program is stopped.

Brief explanation of function (fitness) settings
Min X Minimux of interval, for which the fitness values will be generated.
Max X Maximum of interval, for which the fitness values will be generated.
Number of points The number of points (fitness cases) to be generated. As number of points is +1 as number of intervals between them, it is recommended to enter +1 points to get "nice" numbers (i.e. 11 instead of 10 for interval (0; 10)
Enter function Enter function for generating the points.
    Supports:
  • one variable (x) and numbers
  • the following functions: * +, -, *, /, ^, %, cos, sin, tan, acos, asin, atan, sqrt, sqr, log, min, max, ceil, floor, abs, neg, rndr.
    Notes:
  • parser doesn't know which operator should be evaluated as the first one, they are evaluated from the left to the right so the 3*x^2 will be evaluated as (3*x)^2 and not expected 3*(x^2)
  • example use of max function: (x^2)max(x+5)
author of function parser: Math Expression Evaluator The-Son LAI, Lts@writeme.com
Parse function and generate points After you entered interval, number of points and function, click on the "Parse function and generate points" button. If number rules will not be correct, the error message will appear and the wrong field will gain focus. When the intervals are correct, the function will generate points (fitness cases).
    The following results for a line are possible:
  • for x is generated y number (OK)
  • for x is generated null (NOT OK, function is not correct, i.e. 4x instead of 4*x)
  • for x is generated NaN (OK, but this point will not be taken, i.e. 1/x for x=0)
  • for x is generated +/-Infinity (OK, but this point will not be taken)
Fitness cases These are the samples of the target function that we are trying to approximate. The fitness cases are a series of x-y value pairs (x is the independent variable, y is the function value).
To enter your own fitness cases, either type in the values or paste them from the clipboard.
You must enter at least 10 x-y pairs.

Genetic Settings for the Program

You may change some parameters of the genetic program in the Genetic Settings dialog. Call it up with the Genetic Settings button. Settings cannot be changed while the genetic program is running. Therefore, the Genetic Settings button is enabled only when the program is stopped.

Brief explanation of genetic settings
Number of generations The number of generations. Each generation has some population.
Population size The number of individuals (i.e. synthesized programs) in the population. Each generation has the same number of individuals.
Max. depth for new individuals The maximum depth of program trees for new individuals.
Crossover fraction

Reproduction fraction

Mutation fraction

When producing a new generation, new individuals are generated from the old population by three methods:
  • crossover of two parents, producing two offsprings
  • reproduction of a single parent, producing one offspring
  • mutation of a single parent, producing one offspring

The ratio of the fraction numbers determines the ratio of the methods chosen.

Max. depth for individuals after crossover The maximum tree depth for programs produced by crossover. If an offspring exceeds this limit, one of the parents is chosen instead.
Max. depth for new subtrees in mutants The maximum depth of subtrees that are spliced in at mutation.
Method of generation Determines the shape of the program trees in the initial population:
  • Grow produces randomly bushy trees not exceeding a maximum depth. The distance from the root to the terminal nodes varies randomly.
  • Full produces fully balanced trees. The distance from the root to any terminal node is the same.
  • Ramped half and half is a mixture between the Grow and the Full method. Half of the trees are fully balanced and the other half is created bushy. The maximum depth varies between a minimum (typically 2) and the maximum value that you set with Max. depth for new individuals above.
Method of selection Determines the method how parents are selected when breeding a new generation:
  • Fitness proportionate prefers fitter individuals as parents. The fitter an individual, the more likely it is chosen as parent.
  • Tournament selects the fittest parent(s) out of a pool of randomly chosen individuals.
Function set The list shows all the functions available. Functions are the internal nodes of the program trees.
The algorithm uses only those functions that you have selected in the list.
You must select at least one item.
Terminal set The list shows all the terminal types available. Terminals are the leaves of the program trees.
The algorithm uses only those terminals that you have selected in the list.
You must select at least one item.

Other Settings for the Program

You may change some other (not function and genetic) related settings.

Brief explanation of other settings
Number of black traces The number of black traces (current tries).