Genetic Programming

Download Report

Transcript Genetic Programming

Genetic Programming

Using Simulated Natural Selection to Automatically Write Programs

Genetic Programming

 John Koza, Stanford University  Principal proponent of GP  Has obtained human-competitive results in a number of problem domains  Reproduced existing patents  Created new patentable designs  Has written extensively on GP  Four volume set on Genetic Programming  Numerous papers on the GP

Genetic Programming

 Basic Algorithm  Create a population of programs  Each program attempts to solve a set of problems in a “training set.”  Program fitness is determined by success in solving training set  More fit members have better chance to produce offspring in the next generation  Offspring are produced using some form of crossover

Tree Structure of Genetic Programs

 Various structures are used to represent genetic programs, but tree structures are the most well known.

 Nonterminal nodes are functions that take their children as parameters.

+

1 2

Tree Structure

 Terminal Nodes, the nodes that make up the leaves of a program tree, provide data to the program.

 Constants  Parameterless functions  Inputs

Genetic Program Components

 Terminal Set  Work as set of primitive data types  Constants  Parameterless functions  Input Values  Function set  Set of available functions  Often tailored specifically for the needs of the program domain.

Initializing the Population

 The following two parameters are specified  Maximum depth of a program tree  Maximum number of nodes in a program tree  Three methods in common use (Koza)  Full  Nonterminals are used to build a complete tree up to the leaf nodes, which are then completely populated with terminals. Every tree is grown to maximum depth and has the maximum number of nodes allowed.

Initializing the Population (continued)

 Three methods in common use (Koza)  Grow  The root node is chosen from the function set  All nodes not at maximum depth are chosen randomly.

 Growth for a branch ends when a terminal is chosen.

 Trees can have irregular shapes.

 Nodes at the maximum depth are chosen from the terminal set only.

Initializing the Population (continued)

 Three methods in common use (Koza)  Ramped Half and Half  M is the max depth of deepest partition in the population  The population is separated into M partitions  The ith partition, (i ranges from 0 to M-1) has a max depth of M – i.

 Half of each partition is populated with grow, the other half is populated with full.

Genetic Operators: Crossover

 Crossover  Randomly select a node in the mother  Randomly select a node in the father  Swap the two nodes along with their subtrees

1 Parent 1 * 2 +

Crossover Example

Parent 2 / power 2 2 13 4 -7 Child 1 Child 2 + abs * 13 / 2 2 1 power 2 abs -7 4

Genetic Operations: Mutation

 Mutation  Randomly select a node in the program tree  Remove that node and its subtree  Replace the node with a new subtree, using the same method used to initially instantiate the population.

 Typically, mutation is applied to a small number of offspring after crossover.

1 1 * 2 +

Mutation Example

+ Left subtree is randomly selected for mutation.

3 4 + The entire subtree is replaced * 2 2 * 7 4

Fitness-based Selection

 Gives “graded and continuous feedback about how well a program performs on the training set” (Banzhaf et. al.)  Standardized Fitness  Fitness scores are transformed so that 0 is the fitness of the most fit member.

 Normalized Fitness  Fitness is transformed to values that always are between 0 and 1.

Different Selection Algorithms

 GA Scenario  Same as that used in Genetic Algorithms  Create gene pool by selecting parents based on fitness  Next generation completely replaces current generation  ES Scenario  Same as used in Evolutionary Strategies  Generate children first  Apply fitness function to parents and children  Select the next generation from children (and possibly parents too)  Selection pressure can be tuned by adjusting the ratio of the number of offspring to the number of parents.

Selection Pressure

 Ratio of the best individual’s selection probability to the average selection probability MostFitSelectionProbability / AverageFitSelectionProbability  The larger this ratio, the greater the selection pressure.

Sample Fitness Measures

 Error Fitness  The sum of the absolute value of the differences between the computed result and the desired result.

f p

i n

  1 |

p i

o i

| Where: f p is the fitness of the p th individual in the population o i is the desired output for the i th example in the training set p i is the output from the p th individual on the i th example in the training set * Squaring the expressing (p i -o i ) can provide larger penalties for errors.

Fitness Measures can be as Varied as the Applications

 Examples  Number of correct solutions  Number of wins competing against other members of the population.

 Number of errors navigating a maze  Time required to solve a puzzle

Truncation or (µ, λ) Selection

 A number of parents (µ) are allowed to breed and produce (λ) children. The µ best children are used to produce the next generation.

 A variation, (µ + λ) selection includes the parents in those considered for selection into the next generation.

Ranking Selection

 Selection Based on Fitness Order  The members of the population are ranked from best to worst.

 The selection probability is assigned based on the rank.

Tournament Selection

 Select a subset of the population (the tournament size) randomly.

 More fit (winning) individuals are used to generate replacements for less fit (losing) individuals.

 Accelerates processing time (compared with full competition)  Facilitates parallel processing

The Basic GP Algorithm (from Banzhaf, et. al)

 Define the terminal set  Define the function set  Define the fitness function  Define parameters such as population size, maximum individual size, crossover probability, selection method, and termination criterion

Generational GP

 Like what we have seen in GA  New generation completely replaces the previous generation.

 Initialize the population  Evaluate the individual programs  Until a new population is fully populated, repeat  Select an individual or individuals in the population using selection algorithm  Perform genetic operations on the selected individual or individuals  Insert the result of the genetic operations into the new population  Best individual is the resulting program.

Steady State GP

 1.

2.

3.

4.

5.

There are no generations Initialize the population Randomly choose a subset of the population to take part in the tournament Evaluate the fitness value of each competitor in the tournament.

Select the winner or winners from the competitors in the tournament using the selection algorithm.

Apply genetic operators to the winner or winners of the tournament

Steady State GP (continued)

6.

7.

Replace the losers in the tournament with the results of the application of the genetic operators to the winners of the tournament.

Repeat steps 2-6 until the termination criterion is met.

Introns

 Code sections (functions) that provide no real value for the problem at hand  Introns do not directly affect the fitness of the individual.

 e.g., j = j + 0 or j = j * 1  Early and middle sections of GP runs might include 40-60% introns.

 Later in the run, introns begin to dominate the code.

 Introns growth is exponential!

Why GP Introns Emerge

 Children tend to be less fit than parents  Crossover and mutation can be extremely destructive  Introns reduce the destructive effects of genetic operators  Parents generate introns when it is easier to protect what they already can do, through the creation of introns, than improve on what they are currently doing.

Effective Fitness

 Function of at least two factors  The fitness of the parent  Likelihood that genetic operators will affect the fitness of the parent’s children

Effects of Introns

 Introns may have differing effects before and after exponential growth of introns begins  Different systems may generate different types of introns with different probabilities.

 The extent to which genetic operatos are destructive in their effect is likely to be a very important initial condition in intron growth.

 Mutation and crossover may affect different types of introns differently.

Problems Caused by Introns

 Run stagnation (no progress)  Poor results (do nothing code)  Drain on memory and CPU time (storing and executing unnecessary code)

Possible Beneficial Effects of Introns

 Introns might serve to isolate useful code blocks  This facilitates the building block model by protecting useful building blocks from disruption

Methods of Handling Introns

 Reduce the destructiveness of genetic operators  Reducing destructive crossover to 0 results in hill climbing  Attach fitness penalty to the length of the program.

 Change the fitness function  Provides the GP with a way to improve that is better than just insulating the current best solution.

References

 Genetic Programming, An Introduction  Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, Frank D. Francone  Genetic Programming Tutorial  John Koza, Gecco 2005  Genetic Programming: The Movie  John Koza