Transcript Document

Genetic Programming
and
Genetic Algorithms
General Introduction
7/21/2015
1
Introduction
This series of lectures tries to cover the area of search from a different
perspective.
We first observe that every program is a function, from a domain to a
range: a program takes an input from an “acceptable set of inputs” and
generates an output (side-effects could be part of the output).
Sometimes the program “is aware” of the exact set of acceptable
inputs - and reacts appropriately to inputs outside that set; most of the
time such awareness is limited and so the function the program
corresponds to may produce unpredictable (or undesirable) inputoutput pairs outside a small part of the possible set of inputs.
7/21/2015
2
Introduction
If the function can be represented in terms of already known functions
either in terms of algebraic formulae or in terms of exact (domain,
range) pairing rules, we can describe the function explicitly and we
can end with a program for computing the input-output relation.
If the function cannot be so represented, we have a problem…
If the function CAN be so represented, we may still (and, with high
probability, do) have a problem (NP-Complete, anyone?)…
Can we solve the problem(s)?
The answer will be, by and large, NO, BUT…
7/21/2015
3
Introduction
If we do not have explicit rules to take us from an input to an output,
what can we expect to have?
a) A finite collection of valid input-output pairs;
b) A way of evaluating whether a collection of input-output pairs
produced is more or less desirable than some other collection of pairs
with the same input components;
c) A way of evaluating when our process of function construction can
stop, either because we are not generating “better functions” or
because some other cost (time or space) is becoming unacceptable.
7/21/2015
4
Introduction
In some instances, we have a function and we are trying to find an
input-output with some desired characteristics - e.g., a maximum, a
minimum or a saddle-point of the function. If the function is given
with a simple analytical form, Calculus techniques may be adequate.
If the function is given in a very complex form (or, at least, not
amenable to simple analytic techniques), an “intelligent search” (using
known properties of the function to reduce the search space) may be
the only available strategy.
7/21/2015
5
Introduction
One of the early results was obtained by McCulloch and Pitts (starting
in the 1940’s) via their study of perceptrons: input-output networks
with an input layer of nodes connected to an output layer of nodes.
By the late 1960’s this setup had been shown to be inadequate to
represent useful functions (e.g.: XOR) (Minsky and Papert,
Perceptrons, MIT Press, 1969). Later on, other people showed that
the introduction of a third layer was adequate for the approximation of
any desired “well-behaved” function: the level of approximation was
tied to the number of nodes in this intermediate layer.
More specifically, we have:
7/21/2015
6
Introduction
The Universal Approximation Theorem. Let f(•) be a non-constant,
bounded and monotone-increasing continuous function on R. Let Ip
denote the p-dimensional closed unit hypercube [0, 1]p. Let C(Ip)
denote the space of continuous functions Ip --> R. Then, given any
function f C(Ip) and e > 0, there exists an integer M and sets of real
constants ai, qi and wij, where i = 1, …, M and j = 1, …, p, such that
M
p


we may define F (x1, , x p )  a i wij x j  q j 
i1
j1


as an approximate realization of the function f(•); that is,
|F(x1, …, xp) - f(x1, …, xp)| < e
for all (x1, …, xp)  Ip.
 the references in: Simon Haykin, Neural Networks, a
Proof. See
Comprehensive Foundation,Macmillan, 1994, pp. 181-182. (There is a
second edition out…)
7/21/2015
7
Introduction
We can observe:
•
the logistic function 1/[1 + exp(-v)] used as the nonlinearity in a
neuron model satisfies the conditions on f(•) above;
• the network can be thought as having p input nodes and a single
hidden layer with M neurons; the inputs are x1, x2, …, xp;
• the hidden neuron i has synaptic weights wi1, wi2, …, wip and
threshold qi;
• the network output is a linear combination of the outputs of the
hidden neurons, with a1, …, aM defining the coefficients of this
combination.
7/21/2015
8
Introduction
The theorem itself is a simple existence theorem - a generalization of
Fourier’s result going back to the 1820’s. It tells us that a single layer
suffices, but it does not tell us anything about efficiency of
representation, minimality, ease of “training”, size of the required set
of hidden nodes, etc.
It allows for further results relating the accuracy of estimation of a
function f by the approximation F to the accuracy of the empirical fit i.e., a way of relating M, the number of hidden nodes, to N, the size of
the training set. One can also show that there exists a choice of M so
that the rate of convergence of training is O((1/N)1/2) time a
logarithmic factor.
7/21/2015
9
Introduction
One of the problems with single hidden layer perceptrons is that all
intermediate information is “global”: it becomes impossible to
interpret anything as a “local feature”.
Introducing two hidden layers allows one to interpret the information
at the layer closest to the input as “local” while the information at the
layer closest to the output is “global”.
Other types of problems lead in a natural way to two-layer networks.
7/21/2015
10
Introduction
The point to be made is that, although mathematical theory supports
the approximability of most reasonable classes of functions from an ndimensional euclidean space to an m-dimensional one, coming up with
a usable approximation in a reasonable amount of time (= use of
computational resources) given a “small” amount of information
remains (for now and forever) a completely non-trivial problem.
The problem is, essentially, one of search: among all possible
approximations find one that
a) fits the known input/output relation “best” (e.g., with minimum
least square error);
b) allows us to interpolate at points where we do not have data;
c) has acceptable cost in both construction and evaluation.
7/21/2015
11
Introduction
What general method can we devise?
• Random Search; start with a pair, add another,…, and so on,
rejecting all those that do not meet acceptable (input, output) criteria.
You may use continuity - which, essentially, says that nearby input
value should lead to nearby output values. After a while, you may
have enough of a function to perform (linear) interpolation with some
feeling that you won’t have too many surprises.
• Non-Random Search: find a clever general purpose algorithm that
will allow us to build a “good” function with acceptable cost.
For the second one, recall that just about all the NP-Complete
problems we know require a complete enumeration of all possible
configurations to guarantee finding a solution… so…
7/21/2015
12
Introduction
More formally:
The No Free Lunch Theorems: No search algorithm is superior to any
other algorithm on average across all possible problems.
Consequence: algorithms that perform better than random search on
one class of problems will perform worse than random search on
another.
Consequence: all algorithms we devise will have to be tailored to the
search domain - it is the use of specific domain-dependent information
that gives us algorithms better than random search. Outside of a given
domain, our results will be (much) worse.
7/21/2015
13
Introduction
We are going to concentrate on some aspects of search and of function
construction through search - the techniques to be introduced attempt
to find computational analogues to methods that (to the best of our
current knowledge) appear used in biological evolution.
Since biological evolution appears based on modification of the DNA
exchanged from parent(s) to offspring we are going to have to find
ways to:
a) encode all desired characteristics of a problem in a data structure
that can support “DNA-type” modifications (whatever they will be,
but they must include some analogues of chromosomal recombination
and mutation applied to strings over some alphabet) and still remain
meaningful in our context;
b) construct an evaluation function (equivalent to evolutionary fitness
determination) on either the underlying data structure
7/21/2015
14
Introduction
derivable from the original one (i.e., the genotype);
c) devise a strategy for “differential reproduction”, so that “fitter
individuals” produce more offspring with a high chance of possessing
“desirable characteristics”, while guarding against “genetic
overspecialization”.
A lot of the terms are in quotes, since it is not obvious how the
analogy with biological systems will be implemented: as they say, the
devil is in the details…
7/21/2015
15
Introduction
The Problem of Representation. How do we represent the
information to be searched for?
In DNA-based search we deal with an alphabet over 4 letters (A, C,
G, T), and finite strings over that alphabet. The length of the strings is
(more or less) fixed and parent-child information exchange seems to
consist (at least at a first approximation - and for two-parent species)
of the joining together of two single parental chromosomal strands
into a double descendant strand with dominant and recessive alleles
for nearly all the genes so transmitted. Two other known mechanisms
involve movement of sections of the string from one location to
another and single-locus changes (point mutations). Besides some
other known mechanisms, there may well be many unknown ones.
7/21/2015
16
Introduction
Some early representations used fixed-length strings of binary digits:
each position had a 0 or a 1 (a bit-gene). A function scored the string.
Evolution of a solution involved:
1. Determining which individuals would reproduce.
2. Selecting the pairs of individuals contributing to the offspring.
3. Determining how they would so contribute.
4. Determining the role and frequency of point-mutations.
5. Defining the new population.
6. Repeating the process from 1. above.
7. Determining when termination was reached.
8. Extraction of the “best” individual as the solution.
7/21/2015
17
Introduction
There are several slightly different ways of looking at the evolution of
a population via genetic-analogue methods:
1) each generation corresponds to another subset of possible
individuals. Convergence will correspond to a family of subsets of
decreasing diameter (under some metric).
7/21/2015
18
Introduction
2) Given a population of M individuals with N binary genes each, the
number of states such a population can be in is given by the formula:
2 N  M 1
 N

 2 1 
Finding a solution means, essentially, finding a limiting state
for the evolving family of populations. A “best” individual of the limit
population is our desired solution.

7/21/2015
19
Introduction
3) We can interpret each possible individual as a point in some space.
The evaluation function defines a function from this space to R. We
look for the maximum of this function over the space. The graph of
this function provides us with what is called a “fitness landscape”.
The evolution mechanism creates sets of different points in the domain
from one generation to the next. We stop when several successive
generations have not produced a “substantially better individual”, or
when a given amount of computational resources has been expended.
7/21/2015
20
Introduction
We need to consider how the offspring will receive the information
from the parents, and the mechanisms that correspond to pointmutation and, possibly, to larger scale mutation (e.g., repositioning of
substrings within a chromosome).
Start from binary strings: each parent is represented by a binary string
of length N. A “reasonable analog” to the contribution of single
chromosomal strands from each parent, with dominant and recessive
alleles may be to just take a prefix substring of (random) length 0 ≤ n
≤ N from the first parent, a suffix substring of length N - n from the
second parent and concatenate them in the same prefix-suffix order.
This provides a new chromosome of length N. Mutation can be
simulated by “walking down” the new string and randomly resetting
the bits.
7/21/2015
21
Introduction
At the simplest level, we now have a data structure which is just a
fixed length string of bits, an evaluation function to provide relative
ranks of different individuals, a mechanism for enforcing differential
reproduction, and a mechanism to provide the next generation. A
variant may include allowing some of the “best” individuals of one
generation to have exact copies appear in the next.
A question that begs to be asked is: given a current generation, can we
say anything about the evolution of genetic patterns from one
generation to the next? This is crucial to our being able to believe that
the process set in motion has some convergence properties, rather than
just leading us to a completely random population.
7/21/2015
22
Introduction
Other Representations. Some problems have “natural
representations” in terms of larger alphabets (actual DNA comes to
mind - 4 letters) or in terms of continuous quantities (requiring
floating point numbers over various ranges). If the cardinality of the
alphabet is a power of 2, we can still use the same bit-oriented
mechanisms, and the “chromosome” at the next generation will remain
meaningful.
If the “alphabet” is made up of continuous ranges, the problem of
representation becomes more complex. A possible solution involves
using a contiguous range of bits to represent each more structured
entity, with the caveat that mutation and recombination must be
constrained not to exit the appropriate ranges. Another solution
involves accepting floating point values (rather than bits) as “genes”.
7/21/2015
23
Introduction
This may simplify the interpretation (and implementation) of
recombination and mutation, but we are still left with the problem of
guaranteeing meaning for the results of such actions.
Another representational problem arises in what is properly called
“genetic programming”: where the object being “evolved” is a
program that attempts to compute a specific (only partially known)
function. “Natural” representation of programs may involve trees,
where the nodes are functions (interior nodes) or parameter values
(leaves - if the program does not require iteration or recursion), or
graphs. Although it is always possible to reduce everything to bits,
any intuition is likely to be so removed from the bit-representation as
to be useless.
7/21/2015
24
Introduction
A difference in approach between genetic algorithms and genetic
programming can be exemplified in the two diagrams below: in
genetic algorithms, once the problem is codified into a data structure,
we just apply the genetic algorithm; in genetic programming the
interaction with the original problem remains more direct and ongoing.
7/21/2015
25
Introduction
More specifically, the genetic algorithm approach results in defining a
binary string and then modifying and evaluating binary strings,
constructing successive generations using (at least) crossover and
mutation operators:
7/21/2015
26
Introduction
The genetic programming approach leads to a cycle:
7/21/2015
27
Introduction
The “string” is replaced by a tree, and the tree can be modified in ways
that are more complex than those supported by a string (the apparent
cycle is not crucial - the methods are all cyclical in nature). More
crucial is the observation that we cannot limit a program to a fixed
number of “tree nodes”: the search would be much too limited. This
implies that the usual methods (which we will study in more detail
later) for evaluating “convergence” will have to be modified - if they
are applicable at all.
7/21/2015
28
Introduction
Why should anything “converge”? By allowing a “best element” of
the population P to survive from one generation to the next, we can
ensure that the “derived evaluation function”: F(t) = maxx Pt f(x) is
monotone non-decreasing in t, but this does not mean that we should
expect improvement or convergence. Another approach, based on the
idea of a schema, provides a probabilistic approach, still with
substantial drawbacks.
7/21/2015
29
Introduction
Some Examples: 1 - Function Optimization. You are given the
function f(x) = x•sin(10π x) + 1.0 over the closed interval [-1.0, 2.0],
and you are expected to find the value of x in that range that
maximizes it [Z. Michalewicz, p.18].
An analytic approach would first compute the zeros of the first
derivative (the function possesses a first derivative at every point in
(-1.0, 2.0) so any interior maxima or minima will appear only at zeros
of the derivative). There are finitely many such values over the given
interval. We can now evaluate the function at all such values, plus 1.0 and 2.0 (solving tan(10πx) + 10πx = 0 will require some numerics
- nothing too hard). A value of x for which we attain the maximum of
this finite set, plus endpoints, provides us with a correct (input, output)
pair, and a solution to our problem.
7/21/2015
30
Introduction
In the absence of analytic techniques, what can we do? We can choose
a finite random set of points in [-1.0, 2.0], evaluate the function at
those points, choose a point where the function achieves a largest
value and stop.
A modification would entail choosing a small random set, finding the
x-value where the function has the largest value; choosing a second
random set “near” this value, and repeating the process with smaller
and smaller sets near better and better values. Stop when you don’t
improve from one “generation” to the next, or when you run out of
computational cycles.
The second technique, just like the first, is likely to leave us stuck near
a “relative maximum” which is not optimal… Can we do better?
7/21/2015
31
Introduction
How do we devise a genetic algorithm? Essentially, we want to add
some “intelligence” to this random search: try to avoid getting stuck
on local maxima, and direct the search so that it is - hopefully - more
efficient than strictly random. How?
Representation: what precision do we want? Let’s choose six places
after the decimal point (there has to be a point beyond which we don’t
care). Six decimal points = 3•1000000 intervals over [-1.0, 2.0].
Notice that 2097152 = 221 < 3000000 < 222 = 4194304, so we will use
a string on 22 bits to represent numbers in the desired range.
We now have binary strings b21b20…b1b0, which we can convert to
21

decimal in [-1.0, 2.0]:

(b21b20 b0 )2  i 0 bi 2i   x',
10
3
x  1.0  x' 22 .
2 1
7/21/2015
32
Introduction
The two “chromosomes” 00…0 and 11…1 correspond to the
endpoints -1.0 and 2.0, respectively. All others correspond to interior
points. The evaluation function simply takes a binary chromosome,
say v, transforms into a decimal number, say dec(v), and evaluates f at
that number: eval(v) = f(dec(v)).
Initialization. Create a random initial population of “chromosomes”.
Ranking. Rank the chromosomes according to the evaluation
function.
Reproduction. Normalize the rankings so that each rank corresponds
to an appropriate subinterval of [0, 1]. Run the random number
generator twice, using the subintervals to determine probability of
choice, to obtain two parents.
7/21/2015
33
Introduction
When the parents are found, we create the offspring. Two operators
are used: crossover and mutation. Select, randomly, the gene after
which the crossover takes place. The first parent contributes the part
of its chromosome ending at that gene (inclusive); the second parent
contributes the final part of its chromosome to the offspring. If
mutation is to be included, one must successively use the random
number generator to determine if each gene of the offspring is to be
changed. Repeat the process until a number of offspring equal to the
desired population is obtained.
Next Generation. We now have a new generation, for the process to
be repeated.
7/21/2015
34
Introduction
Several issues must be resolved:
1) the size of the initial population (50, in this case);
2) the number of generations the process is allowed to continue (150,
in this case);
3) the probability of crossover (the probability that a chromosome
undegoes cross-over: pc = 0.25);
4) the probability of mutation (the probability that a gene is “flipped”:
pm = 0.01).
None of them can be “optimally determined”…
A run provides the following results:
7/21/2015
35
Introduction
The best individuals discovered within 150 generations are given in
the table below.
7/21/2015
36
Introduction
Some Examples: 2 - The Prisoner’s Dilemma. This is discussed in
Michalewicz’s book, in Mitchell’s, and, quite extensively, in D. B.
Fogel’s.
The Problem: two individuals are held prisoner and are under pressure
to confess to some “undesirable activity” implicating the other. The
options (and rewards/punishments) are summarized in the table below.
Player1
Player2
P1 P2 Comment
Defect
Defect
1
1
Both punished (relatively mild)
Defect
Cooperate 5
0
Defector rewarded, holdout punished
0
5
Holdout punished, defector rewarded
Cooperate Cooperate 3
3
Both rewarded - maybe not free??
Cooperate Defect
7/21/2015
37
Introduction
The aim of the game is to find a strategy that, over the long run, will
maximize one’s gains. At any one point, the maximizing strategy
would have the winner defect, and the loser holding out, so there is
always a temptation to defect. In fact, defection is always the “safest
individual choice” at each point. On the other hand, a sequence of
mutual defections has a combined payoff much smaller than that of a
sequence of mutual cooperations. We can compute:
1. An infinite sequence of random choices (each configuration has
probability 0.25) has an expected return (for each of the players) of
2.25, with an expected cumulative return of 4.5;
2. An infinite sequence of defections, with the other choosing
randomly has an expected return (for the defector) of 3 and of 0.5 (for
the other), so the expected cumulative return is only 3.5;
7/21/2015
38
Introduction
3. Ex.: compute the expected returns, individual and cumulative, for at
least two other sequences of actions.
Representing a solution. Ideally, we should have a complete memory
of the past to determine the next decision. Since this is not possible, a
decision has to be made on the basis of a “finite memory”.
Michalewicz’s text uses the previous three moves. In fact (Mitchell’s
and Fogel’s books), tournaments (human and computer) were
organized by A. Axelrod in the ‘70’s and ‘80’s to try to determine
“best strategies”, and he decided to use the “three-move-memory” we
will use here. If a “chromosome” contains information about the three
previous moves, and since there are 4 possible outcomes for each
move, we have to keep track of 64 (= 4•4•4) possible games - 64
different histories.
7/21/2015
39
Introduction
If we order the histories in some canonical order, a 64-bit array allows
us to associate a response to each history. We must also “prime the
system” with three initial games, with two bits required for each
(index into histories). Total: 70 bits for a chromosome.
Choose an initial population. Create a number, N, of random 70-bit
strings.
Test each player to determine effectiveness. Use the strategy
encoded in the chromosome to play games against all other players.
The “fitness” is the average score over all the games played. The
original study by Axelrod had each member of the population play
against a set of given strategies - culled from one of the human and
computer tournaments. The initial game sequence provides an index
into the bit-string, so one could use the first six bits to determine the
7/21/2015
40
Introduction
initial strategy for each player (player no. 1 uses its initial conditions
to determine the game sequence to be “continued” by both).
Select the player to breed. A player with an average score (within a
standard deviation of average) is given one mating; a player with a
score one or more standard deviations above average is given two
matings; one with a score one or more standard deviations below
average is give no matings (some minor adjustment may be necessary
to keep the population of constant size).
Breed. Randomly pick pairs, and create two descendants per mating
using both crossover and mutation.
7/21/2015
41
Introduction
Results. Experiments lead to a number of strategies being
“discovered”.
1. Continue cooperation after three initial cooperations:
(CC)(CC)(CC) leads to C.
2. Defect when the other player defects: (CC)(CC)(CD) leads to D.
3. Continue to cooperate after cooperation has been restored:
(CD)(CD)(CC) leads to C.
4. Cooperate when cooperation has been restored after your own
defection: (DC)(CC)(CC) leads to C,
5. Defect after three mutual defections: (DD)(DD)(DD) leads to D.
6. If the payoff for successful defection is increased to 6, strategies can
delop with expected payoffs > 3.0 (Fogel).
7/21/2015
42
Introduction
The evolved strategies can be represented as finite-state machines,
which can also mean that the final best strategy can be interpreted as a
formal program - the best evolved.
This was an example of co-evolution, where the individuals in a
population were pitted against one another and “caused” the
population to evolve through mutual interactions.
For much more detail, see the papers by Axelrod or the book by D. B.
Fogel: Evolutionary Computation.
7/21/2015
43
Introduction
Some Examples: 3 - The Traveling Salesman Problem. This is a
well-known NP-Complete problem, which has some “reasonable”
approximation schemes - at least in the case in which the distances
between nodes satisfy a “triangle inequality” (see Cormen & al., Ch.
35). There can be no expectation of an exact solution (find a tour of
minimum length) in anything but exponential time, but it may be
possible to “beat” the quality of the known deterministic
approximation algorithms without giving up too much time…
What is a chromosome? A “reasonable” interpretation for a
chromosome is that it represents a tour. Since the complete graph
consists of N nodes, is the chromosome a binary string (array) or an
integer array?
7/21/2015
44
Introduction
This is an important decision because of the genetic operators: do we
split at a bit or at a full node index (integer)?
1. If we use a bit representation, our chromosome will need
N•ceiling(log2N) bits. Using crossover or mutation at the bit level
cannot guarantee that the new chromosome represents a new tour, or,
possibly, even that we have a path in the graph (if exp(ceiling(log2N))
> N).
2. If we use integer representation (N integers), and we use our genetic
operators at integer boundaries, we can at least guarantee that we still
have a path, although maybe not a tour.
7/21/2015
45
Introduction
The choice in this case is of an integer representation for the
chromosome, essentially because:
a. it avoids one set of problems - the possible introduction of nonexistent nodes and,
b. it permits some simpler “cross-over repair algorithms” to make sure
that no “stillborn” descendants are allowed into the population. The
repair algorithms can be incorporated into the genetic operators.
c. Mutation can be handled ina similar way.
Intialization. For a population of size M, pick M random
permutations on N items. Another possibility is to use a greedy
algorithm to construct M approximate solutions, and start from those.
7/21/2015
46
Introduction
Evaluation and Ranking. Straightforward for each individual - just
compute the value of the tour it represents.
Breeding. Both crossover and mutation must be implemented
carefully to preserve a tour and to maintain a relationship with the
parents. We will look at the details at some later date.
Results. The algorithm, as described, appears to be better than
random search, but is not very efficient: a 100 city tour, after 20,000
generations, gives a value for the best tour found about 9.4% above
optimum (Michalewicz, p. 26).
7/21/2015
47
Introduction
Some Examples: 4 - Tree-Based Genetic Programming. Assume
we are given a function, either explicitly or, more likely, as a set of
(input, output) pairs. Assume we have a finite set of pairs derived
from the function y = x2. We want to construct a “best interpolating
function” we can, obviously starting only from the set of (input,
output) pairs.
A program can be viewed as a tree structure, where the leaves are
terminals (= parameter values), while the interior nodes are function
calls whose children are values provided from farther down the tree.
7/21/2015
48
Introduction
One would start with an initial population of trees, evaluate the
individuals on the set of (input, output) pairs assigning each a value
dependent on how “good” the match is between the output values
computed by the program and those given. One would then
apply some treecompatible genetic
algorithms to generate the
new population.
7/21/2015
49
Introduction
Another example of “next generation:…
7/21/2015
50
Introduction
Final result: we have a true match, although all we really know is that
we have achieved an “exact interpolation” of the given (input, output)
set.
7/21/2015
51
Introduction
The program simple-gp.c evolves such a solution.
It develops a formula (with some understanding of the need to take
care of zero-divisions) which, complex as it appears, can, in fact be
reduced to an actual function… unfortunately it does not look much
like the actual function on a first reading. This is not an unusual
problem, since the rules for “canonical representation” of rational
functions are fairly hard to implement… see some of the early
programs for symbolic computation.
7/21/2015
52
Introduction
This approach raises a number of representational questions: how
do we represent a program? What is a chromosome? What is
mutation? etc.
Part of the problem is that a constant length chromosome might
correspond to a rather limited family of trees, making the whole
evolutionary process moot. On the other hand, introducing variable
length chromosomes with very large alphabets (= primitive functions
and parameter values) may grow our chromosomes to enormous size
(although our own DNA may be trying to warn us). Furthermore, we
are looking for some theoretical models to at least give plausibility to
our methods: such theoretical models are likely to be far too
complicated
7/21/2015
53
Introduction
Some other ideas on Genetic Programming. Other questions would
arise on the meaning of a “program”: as we indicated in the Prisoner’s
Dilemma, one can evolve finite-state machines that are quite efficient.
Those are, undeniably, programs. The next question might be on how
to represent (and define and apply genetic operators) for stack
machines (supporting recursion), assembly-language machines, graphreduction machines (which are used in the compilation and
optimization of functional language programs), etc.
And all of this requires some kind of supporting theoretical
framework.
7/21/2015
54