Hot questions for Using Neural networks in neat


I have looked up what NEAT is on youtube and the internet, but I can only find projects using NEAT, but apart from the wikipedia entry (which only says what it is in introduction, and is very confusing), I still have no idea what it is, is it a library, is it a type of neural network, is it a method of training neural networks? Sorry if this is an obvious question.


NEAT, or Neuro-Evolution of Augmenting Topologies, is a population-based evolutionary algorithm introduced by Kenneth O'Stanley [1].

The algorithm is based on several key features:


The networks in the initial population are the simplest possible (up to the extreme of no connections at all, leaving the input and output neurons unconnected) and the algorithm only adds new structural elements (neurons, connections). This way, the resulting networks tend to be very small.

Avoiding competing conventions via historical markings

In ordinary evolutionary algorithms it can easily happen that two individuals encode the same (or very similar) behaviour but with very different genotype. This is called competing conventions. When such individuals are subject to crossover, their children are likely to be worse than either parent. NEAT solves this by keeping historical markings of new structural elements. When a new structural element is created (via structural mutation), it is assigned an innovation number (and all such mutations that produced the same element, even in different individuals are also assigned this same number). Then, when two individuals are crossed over, their genotypes are aligned in such a way that the corresponding innovation numbers match and only the differing elements are exchanged.

Speciation and fitness sharing

NEAT works with the concept of species. That is simply a subdivision of the population into several groups of individuals, called species. This subdivision is based on the dissimilarity of the individuals that is computed based on similar alignment of their genotypes as is used when doing crossover. Probability of crossing over individuals from different species is then much smaller than crossover inside species. By promoting the mating of more similar parents, the children are less likely to be much worse than the parents because the parents just were compatible.

Also, within the species, the fitness is shared among the individuals. This serves two purposes. (1) It protects individuals from mutations - when a mutation happens, the fitness would normally be low but because there is fitness sharing, the individual has time to optimize itself (the weights) to adapt to this new structural change. (2) Promotes diversity because the bigger the species, the more is the fitness shared and the less fit are the members of the species.

I strongly recommend reading the original paper [1]. The algorithm is described very well. Also there is a NEAT Users Page that has more links to more papers and also implementations and uses of NEAT.

[1] Kenneth O. Stanley and Risto Miikkulainen. Evolving Neural Networks Through Augmenting Topologies. Evolutionary Computation, 10(2):99-127, 2002.


I have been reading up on the NeuronEvolution of Augmented Topologies and there's this little thing that's been bothering me. While reading Kenneth Stanley's Paper on NEAT I came on this figure here:

The innovation numbers go from 1,2,3,4,5,6 to 1,2,3,4,5,6,7 on the first mutation.

On the second one it goes from 1,2,3,4,5,6 to 1,2,3,4,5,6,8,9.

My question is why does it skip number 7 and goes straight up to 8 instead? I didn't find anything related to deleting innovation numbers.

Same thing on this second figure, how did Parent 1 lose 6,7 and where did the 8th gene go to in Parent 2?


An innovation number (I'll use IN for short) is kind of a label of a specific piece of structure. So a connection from a neuron number 1 to neuron number 2 would have an IN e.g. 1 and all networks with that connection would have this connection labeled as IN 1. When a new connection is created (either by add connection mutation or by add node mutation), it is first checked whether there is such connection in a "database" of INs. If there already is such connection, its IN is used. If it is not, an IN counter is increased, the new connection gets this new IN and is stored in the database.

In the first figure in the upper part, a new connection is added from neuron 3 to neuron 5. If this is the first ever such connection that appeared in the whole population, you increase the IN counter and use this new IN for that connection. If it has already happened elsewhere, you use its IN instead of creating a new one. It could be either of those cases, we don't know how the rest of the population looks like. It just happens that connection 3->5 has IN 7.

Now, in the bottom part of the first figure, you add a neuron 6 between neurons 3 and 4, which means that you add connections 3->6 and 6->4. Again, you first ask "Is there an IN for connection 3->6 in the database?" If it is you use that IN, if not you increase the counter. The same goes for the other connection. In that figure, you can imagine that all these new connections were new ones, so in the upper part the IN counter was at 6 and with the new connection, that was not encountered yet, you increased the IN counter and assigned 7 to the connection. Then the bottom part happened, with two brand new connections, so you increase the IN counter to 8 and 9. In the bottom part, there is no connection 3->5 which has IN 7, so that's the reason the IN 7 is not there.

Regarding the second figure, it's just an example to show how the crossover works when there are disjoint INs. The parents just are this way, for the sake of the example. However, they could have gotten to this state quite easily. Imagine that parent 1, in some earlier point of time of the evolution, had just the INs 1 to 5 and 5 was the newest IN so far. Then, somewhere else in the population, a new neuron (No. 6) between neurons 5 and 4 was added, i.e. new connections 5->6 and 6->4 were created. Since such connections were not encountered yet, the IN counter was increased to 6 and 7. Then, after this, parent 1 was mutated by adding new connection 1->8. Since this was also new, new IN 8 was assigned.


I read through the NEAT paper and I understand the algorithm now.

But one thing is still unclear to me. When does the mutation occur and how does it take place? How is it chosen whether it is an adding node or adding connection mutation? Furthermore how is it chosen where the mutation is taking place in the network (between which connections)?


First of all: the NEAT paper is very broad. It does not give a lot of concrete implementions. But in most GA implementions, mutation occurs after new genomes have been created from parent genomes.

Before you get into NEAT, make sure to study how genetic algorithms work in the first place.

In most cases, the chance that a certain mutation occurs is evenly distributed among the mutation methods. For example, if you have 3 mutation methods (add_node, add_conn and mod_weight) and mutationRate = 0.3, then each of those methods has 0.1 chance of being executed on the genome.

Sometimes adding a connection is not possible, so adding a node might have a higher chance of being executed.

The location of the mutation in the network is also random. Mutation is random overall.

But you are asking very broad questions. The paper you referenced only states how you could implement an effective neuroevolution of network topology. It forms a base for your own project. How you implement it, is all up to you (and what works best for the network!)


I'm learning about NEAT from the following paper:

I'm having trouble understanding how adjusted fitness penalizes large species and prevents them from dominating the population, I'll demonstrate my current understanding through an example and hopefully some one will correct my understanding.

Let's say we have two species, A and B, species A did really well last generation and were given more children, this generation they have 4 children and their fitnesses are [8,10,10,12] while B has 2 and their fitnesses are [9,9] so now their adjusted fitnesses will be A[2, 2.5, 2.5, 3] and B[4.5, 4.5].

Now onto distributing children, the paper states: "Every species is assigned a potentially different number of offspring in proportion to the sum of adjusted fitnesses f'_i of its member organisms"

So the sum of adjusted fitnesses is 10 for A and 9 for B thus A gets more children and keeps growing, so how does this process penalizes large species and prevent them from dominating the population?


Great question! I completely agree that this paper (specifically the part you quoted) says that the offspring are assigned based on the sum of adjusted fitnesses within a species. Since adjusted fitness is calculated by dividing fitness by the number of members of a species, this would be mathematically equivalent to assigning offspring based on the average fitness of each species (as in your example). As you say, that, in and of itself, should not have the effect of curtailing the growth of large species.

Unless I'm missing something, there is not enough information in the paper to determine whether A) There are additional implementation details not mentioned in the paper that cause this selection scheme to have the stated effect, B) This is a mistake in the writing of the paper, or C) This is how the algorithm was actually implemented and speciation wasn't helpful for the reasons the authors thought it was.

Regarding option A: Immediately after the line you quoted, the paper says "Species then reproduce by first eliminating the lowest performing members from the population. The entire population is then replaced by the offspring of the remaining organisms in each species." This could be implemented such that each species primarily replaces its own weakest organisms, which would make competition primarily occur within species. This is a technique called crowding (introduced in the Mahfoud, 1995 paper that this paper cites) and it can have similar effects to fitness sharing, especially if it were combined with certain other implementation decisions. However, it would be super weird for them to have done this, not mentioned it, and then said they were using fitness sharing rather than crowding. So I think this explanation is unlikely.

Regarding option B: Most computer science journal papers, like this one, are based off of groups of conference papers where the work was originally presented. The conference paper where most of the speciation research on NEAT was presented is here: In the explanation of fitness sharing, that paper says: "Species then grow or shrink depending on whether their average adjusted fitness is above or below the population average" (emphasis mine). This is different than the sum of adjusted fitness referred to in the paper you linked to. If they were actually using the average (and mistakenly said sum), they'd effectively be dividing by the number of members of each species twice, which would make all of the other claims accurate, and make the data make sense.

Regarding option C: This one seems unlikely, since Figure 7 makes it look like there's definitely stable coexistence for longer than you'd expect without some sort of negative frequency dependence. Also, they clearly put a lot of effort into dissecting the effect of speciation, so I wouldn't expect them to miss something like that. Especially in such an impactful paper that so many people have built on.

So, on the whole, I'd say my money is on explanation B - that this is a one-word mistake that changes the meaning substantially. But it's hard to know for sure.


I am trying to implement simple NEAT. I read from various sources that there are 4 types of "nodes": input neurons, hidden neurons, output neurons and so-called bias neurons. I don't see which process may create bias neurons, that are depicted in this paper at page 16.

I understand that new neurons may be created while mutating, but it requires an existing connection between two neurons that will be split by this new neuron (basing on paper already mentioned, page 10). However, bias neuron has no "input" connection, so it clearly can't be created in the mentioned way. Then how, in details, does NEAT create bias neurons?


A bias neuron (node) in the NEAT context is simply a special input neuron that is always active. It is always included by construction since it seems to help evolution in many cases.

So, in short, you do not create bias networks just like you won't create new input or output nodes; these are defined by your problem.

You are correct in that the standard NEAT implementation introduces new hidden nodes by splitting existing connections. Hidden nodes are the only neurons you will create or destroy in NEAT (and, to my knowledge, in neuroevolution in general).


I've recently read the original paper about NeuroEvolution of Augmenting Topologies by Kenneth O. Stanley and am now trying to prototype it myself in JavaScript. I stumbled across a few questions I can't answer.

My questions:
  1. What is the definition of "structural innovation", and how do I store these so I can check if an innovation has already happened before?

    However, by keeping a list of the innovations that occurred in the current generation, it is possible to ensure that when the same structure arises more than once through independent mutations in the same generation, each identical mutation is assigned the same innovation number

  2. Is there a reason for storing the type of a node (input, hidden, output)?

  3. In the original paper, only connections have an innovation number, but in other sources, nodes do as well. Is this necessary for crossover? (This has already been asked here.)

  4. How could I limit the mutation functions to not add recurrent connections?

I think that's it for now. All help is appreciated.

The relevant parts of my code:
class Genome {
    constructor(inputs, outputs) {
        this.inputs = inputs;
        this.outputs = outputs;
        this.nodes = [];
        this.connections = [];
        for (let i = 0; i < inputs + outputs; i++) {
            this.nodes.push(new Node());
        for (let i = 0; i < inputs; i++) {
            for (let o = 0; o < outputs; o++) {
                let c = new Connection(this.nodes[i], this.nodes[inputs + o], outputs * i + o);
        innovation = inputs * outputs;
    weightMutatePerturb() {
        let w = this.connections[Math.floor(random(this.connections.length))].weight;
        w += random(-0.5, 0.5);
    weightMutateCreate() {
        this.connections[Math.floor(random(this.connections.length))].weight = random(-2, 2);
    connectionMutate() {
        let i = this.nodes[Math.floor(random(this.nodes.length))];
        let o = this.nodes[Math.floor(random(this.inputs, this.nodes.length))];
        let c = Connection.exists(this.connections, i, o);
        if (c) {
            c.enabled = true;
        } else {
            this.connections.push(new Connection(i, o, innovation));
    nodeMutate() {
        let oldCon = this.connections[Math.floor(Math.random(this.connections.length))];
        oldCon.enabled = false;
        let newNode = new Node();
        this.connections.push(new Connection(oldCon.input, newNode, innovation, 1));
        this.connections.push(new Connection(newNode, oldCon.output, innovation, oldCon.weight));
class Node {
    constructor() {
        this.value = 0;
        this.previousValue = 0;
class Connection {
    constructor(input, output, innov, weight) {
        this.input = input;
        this.output = output;
        this.innov = innov;
        this.weight = weight ? weight : random(-2, 2);
        this.enabled = true;
    static exists(connections, i, o) {
        for (let c = 0; c < connections.length; c++) {
            if (connections[c].input === i && connections[c].output === o) {
                return connections[c];
        return false;

All answers an sources are welcome. (You are an awesome person!)


First, I would very strongly advice against implementing NEAT yourself. If you take a look at the (many) available implementations, it is quite a large project!

  1. A structural innovation is any new node or connection that is added to a genome and that has not been seen before. Imagine you have input nodes 1, 2, 3 and output nodes 4, 5. If only connection 2-4 is available, introducing connection 3-4 would be an structural innovation. To check for novelty you need to store all seen structures (i.e., a list of all connections and nodes) with a unique ID for each (this is the core idea behind NEAT, actually!). In our example, connection 2-4 may take ID=1, and connection 3-4 would take ID=2. You can see the connection is new in that no other connection in the list connects 2 and 4. Nodes are normally introduced by creating "a stop" in a connection and simply take the next available ID. For example, connection 2-4 would be deleted and you would have connections 2-5 and 5-4, where node ID=5 is created in the process (as well as two new connections). Note the IDs for nodes and connections may be independent (that is: if you use IDs for connections at all).
  2. I'm struggling to think of a hard requirement for this. In principle you could simply store nodes in fixed order (input first, output next, then hidden) and then guess their type given their index, which is how you normally do it anyway for performance reasons (imagine trying to remove a node, you would only want to select a hidden node, so you would restrict search to those indices). Some tasks may be more efficient having that info, though, for example checking for recurrent connections (see 4).
  3. IDs are useful in crossover, as they allow to quickly know which elements are common between two genomes. Whether to have IDs for nodes as well as connections is an open implementation decision. No IDs for connections makes simpler code (connections are identified by the IDs of the nodes they connect). But you lose the ability to tell apart two connections that connect the same nodes. There is an argument that says that a connection between two given nodes does not necessarily mean the same at different times in evolution (see how your quote mentions "in the same generation"). This is probably not a relevant factor, though! As I said, the convenience for IDs for both nodes and connections is still debated in the NEAT community.
  4. In many cases you do not want to allow recurrent connections. The standard way to do this is to check for recurrence every time you try to add a connection. This is a costly step, yes!

If you have more doubts, I recommend you take a look at this implementation by Colin Green for reference. If he is not the person who knows more about NEAT implementation, he comes close.


EDIT: I managed to get together couple of simple examples

Could anyone help me with simple neat example teaching the net how to solve XOR or some other similar problem? But using the NEAT technique so that I would not have to specify training data set?

Using javascript: or

1. Initialize network 2. Generate generation 3. Go through each genome in generation and evaluate its fitness (how good it is) 4. Take 2 best genomes from generation 5. Merge genomes 50 / 50 at random 6. Mutate final genome 7. Generate second generation

This would be extremely helpful. Same teqnique is being used here:

I went through the source code but there is WAY to much stuff going on. I understand the general idea. However I have no idea how to implement the solution.

Thank you :)


There is an example on Neataptic's

// this network learns the XOR gate (through neuro-evolution)
var network = new Network(2,1);

var trainingSet = [
  { input: [0,0], output: [0] },
  { input: [0,1], output: [1] },
  { input: [1,0], output: [1] },
  { input: [1,1], output: [0] }

await network.evolve(trainingSet, {
  equal: true,
  error: 0.03

Neataptic has it all built-in so all you have to provide is a data set. If you need more info on how this has been set up, read this article.

For problems with dynamic solutions, a custom loop and fitness function has to be implemented.


I am using NEAT-Python to mimic the course of a regular sine function based on the curve's absolute difference from 0. The configuration file has almost entirely been adopted from the basic XOR example, with the exception of the number of inputs being set to 1. The direction of the offset is inferred from the original data right after the actual prediction step, so this is really all about predicting offsets in the range from [0, 1].

The fitness function and most of the remaining code have also been adopted from the help pages, which is why I am fairly confident that the code is consistent from a technical perspective. As seen from the visualization of observed vs. predicted offsets included below, the model creates quite good results in most cases. However, it fails to capture the lower and upper end of the range of values.

Any help on how to improve the algorithm's performance, particularly at the lower/upper edge, would be highly appreciated. Or are there any methodical limitations that I haven't taken into consideration so far?

config-feedforward located in current working directory:

#--- parameters for the XOR-2 experiment ---#

fitness_criterion     = max
fitness_threshold     = 3.9
pop_size              = 150
reset_on_extinction   = False

# node activation options
activation_default      = sigmoid
activation_mutate_rate  = 0.0
activation_options      = sigmoid

# node aggregation options
aggregation_default     = sum
aggregation_mutate_rate = 0.0
aggregation_options     = sum

# node bias options
bias_init_mean          = 0.0
bias_init_stdev         = 1.0
bias_max_value          = 30.0
bias_min_value          = -30.0
bias_mutate_power       = 0.5
bias_mutate_rate        = 0.7
bias_replace_rate       = 0.1

# genome compatibility options
compatibility_disjoint_coefficient = 1.0
compatibility_weight_coefficient   = 0.5

# connection add/remove rates
conn_add_prob           = 0.5
conn_delete_prob        = 0.5

# connection enable options
enabled_default         = True
enabled_mutate_rate     = 0.01

feed_forward            = True
initial_connection      = full

# node add/remove rates
node_add_prob           = 0.2
node_delete_prob        = 0.2

# network parameters
num_hidden              = 0
num_inputs              = 1
num_outputs             = 1

# node response options
response_init_mean      = 1.0
response_init_stdev     = 0.0
response_max_value      = 30.0
response_min_value      = -30.0
response_mutate_power   = 0.0
response_mutate_rate    = 0.0
response_replace_rate   = 0.0

# connection weight options
weight_init_mean        = 0.0
weight_init_stdev       = 1.0
weight_max_value        = 30
weight_min_value        = -30
weight_mutate_power     = 0.5
weight_mutate_rate      = 0.8
weight_replace_rate     = 0.1

compatibility_threshold = 3.0

species_fitness_func = max
max_stagnation       = 20
species_elitism      = 2

elitism            = 2
survival_threshold = 0.2

NEAT functions:

# . fitness function ----

def eval_genomes(genomes, config):
  for genome_id, genome in genomes: = 4.0
    net = neat.nn.FeedForwardNetwork.create(genome, config)
    for xi in zip(abs(x)):
      output = net.activate(xi) -= abs(output[0] - xi[0]) ** 2

# . neat run ----

def run(config_file, n = None):
  # load configuration
  config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                       neat.DefaultSpeciesSet, neat.DefaultStagnation,
  # create the population, which is the top-level object for a NEAT run
  p = neat.Population(config)
  # add a stdout reporter to show progress in the terminal
  stats = neat.StatisticsReporter()
  # run for up to n generations
  winner =, n)



### . packages ----

import os
import neat

import numpy as np
import matplotlib.pyplot as plt
import random

### . sample data ----

x = np.sin(np.arange(.01, 4000 * .01, .01))


### . model evolution ----

winner = run('config-feedforward', n = 25)

### . prediction ----

## extract winning model
config = neat.Config(neat.DefaultGenome, neat.DefaultReproduction,
                     neat.DefaultSpeciesSet, neat.DefaultStagnation,

winner_net = neat.nn.FeedForwardNetwork.create(winner, config)

## make predictions
y = []
for xi in zip(abs(x)):

## if required, adjust signs
for i in range(len(y)):
  if (x[i] < 0):
    y[i] = [x * -1 for x in y[i]]

## display sample vs. predicted data
plt.scatter(range(len(x)), x, color='#3c8dbc', label = 'observed') # blue
plt.scatter(range(len(x)), y, color='#f39c12', label = 'predicted') # orange
plt.hlines(0, xmin = 0, xmax = len(x), colors = 'grey', linestyles = 'dashed')
plt.legend(bbox_to_anchor = (0., 1.02, 1., .102), loc = 10,
           ncol = 2, mode = None, borderaxespad = 0.)


There exist different implementations of NEAT, so the details may vary.

Normally NEAT handles biases by including a special input neuron that is always active (post-activation 1). I suspect bias_max_value and bias_min_value determine the maximum allowed strength of connections between this bias neuron and hidden neurons. In the NEAT code that I used these two parameters did not exist, and bias-to-hidden connections were treated as normal (with their own allowed range, in our case -5 to 5).

If you are working with Sigmoid functions your output neurons will work on a 0-to-1 range (consider faster activations for hidden neurons, RELUs perhaps).

If you are trying to predict values close to 0 or 1 this is a problem, since you really need to push your neurons to the limits of their range, and Sigmoids approach those extremes asymptotically (slowly!):

Fortunately, there is a very simple way to see if this is the problem: simply re-scale your output! Something like

out = raw_out * 1.2 - 0.1

This will allow your theoretical outputs to be in a range that goes beyond your expected outputs (-0.1 to 1.1 in my example), and reaching 0 and 1 will be easier (and actually possible strictly speaking).


EDIT: I managed to get together couple of simple examples

I just started playing with neataptic. I wanted to make the neural network to learn how to count using number: 1, 2, 3, 4, 5, 6, 7, 8, 9.

I normalized my inputs to 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9.

Then I wrote really simple training program that would teach the net how to add 1 + 2 (0.1 + 0.3).

const architect = require('neataptic').architect

const myTrainingSet = [
  { input: [0.1, 0.2], output: [0.3] },
  { input: [0.2, 0.1], output: [0.3] }

myNetwork = architect.Perceptron(2, 3, 1);

myNetwork.train(myTrainingSet, {
  log: 1,
  error: 0.01,
  iterations: 1000,
  rate: 0.3


The problem is that this logs:

[ 0.3717501873608793 ]
[ 0.3695919770977549 ]
[ 0.37142744367869446 ]

It basically logs 0.3 for every input. Can some please explain what have I done wrong? :)


The dataset is too small for the neural network to learn from patterns. You have only provided it with samples that have 0.3 as output. The neural network minimizes its error by always outputting 0.3, as that is exactly how it was trained. I have created an example with 1000 (dynamically generated) samples, which seems to work:

const architect = neataptic.architect;

const trainingSet = [];

for (var i = 0; i < 1000; i++) {
    let integer1 = Math.floor(Math.random() * 10);
  let integer2 = Math.round(Math.random() * (10 - integer1));

  let output = (integer1 + integer2) / 10;

  trainingSet.push({ input: [integer1 / 10, integer2 / 10], output: [output] });

myNetwork = architect.Perceptron(2, 3, 1);

myNetwork.train(trainingSet, {
  log: 50,
  error: 0.0001,
  iterations: 1000,
  rate: 0.3,
  momentum: 0.9

console.log(myNetwork.activate([0.5, 0.5]));



I don’t understand how the NEAT algorithm takes inputs and then outputs numbers based on the connection genes, I am familiar with using matrixes in fixed topology neural networks to feedforward inputs, however as each node in NEAT has its own number of connections and isn’t necessarily connected to every other node, I don’t understand, and after much searching I can’t find an answer on how NEAT produces outputs based on the inputs.

Could someone explain how it works?


That was also a question I struggled while implementing my own version of the algorithm.

You can find the answer in the NEAT Users Page: where the author says:

How are networks with arbitrary topologies activated?

The activation function, bool Network::activate(), gives the specifics. The implementation is of course considerably different than for a simple layered feedforward network. Each node adds up the activation from all incoming nodes from the previous timestep. (The function also handles a special "time delayed" connection, but that is not used by the current version of NEAT in any experiments that we have published.) Another way to understand it is to realize that activation does not travel all the way from the input layer to the output layer in a single timestep. In a single timestep, activation only travels from one neuron to the next. So it takes several timesteps for activation to get from the inputs to the outputs. If you think about it, this is the way it works in a real brain, where it takes time for a signal hitting your eyes to get to the cortex because it travels over several neural connections.

So, if one of the evolved networks is not feedforward, the outputs of the network will change in different timesteps and this is particularly useful in continuous control problems, where the environment is not static, but also problematic in classification problems. The author also answers:

How do I ensure that a network stabilizes before taking its output(s) for a classification problem?

The cheap and dirty way to do this is just to activate n times in a row where n>1, and hope there are not too many loops or long pathways of hidden nodes.

The proper (and quite nice) way to do it is to check every hidden node and output node from one timestep to the next, and see if nothing has changed, or at least not changed within some delta. Once this criterion is met, the output must be stable.

Note that output may not always stabilize in some cases. Also, for continuous control problems, do not check for stabilization as the network never "settles" but rather continuously reacts to a changing environment. Generally, stabilization is used in classification problems, or in board games.


In paper written about the NEAT algorithm, the author states

A possible problem is that the same structural innovation will receive different innovation numbers in the same generation if it occurs by chance more than once. However, by keeping a list of the innovations that occurred in the current generation, it is possible to ensure that when the same structure arises more than once through independent mutations in the same generation, each identical mutation is assigned the same innovation number.

This makes sense because you don't want identical genes to end up with different innovation numbers. If they did, there would be problems when crossing over two genomes with identical genes but different innovation numbers because you would end up with an offspring with a copy of each gene from each parent, creating the same connection twice.

What doesn't make sense to me though is what happens if a mutation occurs between two genes, and then in the next generation the same mutation occurs? In the paper, it's very clear that only a list of mutations in the current generation is kept to avoid an "explosion of innovation numbers" but doesn't specify anything about what happens if the same mutation occurs across different generations.

Do you keep a global list of gene pairs and their corresponding innovation numbers to prevent this problem? Is there a reason why the paper only specifies what happens if the same mutation happens in the same generation and doesn't consider the case of cross-generational mutations?


No. You don't keep a global list of gene pairs. You can if you want to avoid the same mutation from happening. But something that I want to point out: it doesn't matter. The only effect of the same mutation happening is that you will do some unnecessary computation and your global innovation number will increase.

For future genomes however, there is no chance that they will have two innovation numbers that are the same innovation.

Matching genes are inherited randomly, whereas disjoint genes (those that do not match in the middle) and excess genes (those that do not match in the end) are inherited from the more fit parent

So when two identical innovations occur, they will be either a disjoint or excess gene. These will get inherited from the more fit parent and only one parent can be fitter, so óne offspring will never have the same innovation genes.


I have used Neataptic library to evolve structure for XOR like in the instructions on their github page.

The actual results seems right however when I visualize the nodes (most of the time) there is no hidden layer - only 2 input nodes and 1 output, therefore I assume it should not work. Any idea what's wrong?

var network = new neataptic.Network(2,1);

var trainingSet = [
  { input: [0,0], output: [0] },
  { input: [0,1], output: [1] },
  { input: [1,0], output: [1] },
  { input: [1,1], output: [0] }

async function abc() {
  await network.evolve(trainingSet, {
    equal: true,
    error: 0.03
  document.getElementById("app").innerHTML = "0 ⊕ 0 = " +
    network.activate([0,0]) + "<br />0 ⊕ 1 = " +
    network.activate([0,1]) + "<br />1 ⊕ 0 = " +
    network.activate([1,0]) + "<br />1 ⊕ 1 = " +
  drawGraph(network.graph(500, 400), '#draw');

* {
  box-sizing: border-box;  

body {
  margin: 0;
  padding: 0;
  height: 100vh;

#app {
  vertical-align: top;
  display: inline-block;
  padding: 15px;
  width: 20%;
  height: 100%;

#draw {
  vertical-align: top;
  display: inline-block;
  width: 80%;
  height: 100%;
<link rel="stylesheet" type="text/css" href="">
<script src=""></script>
<script src=""></script>
<script src=""></script>
<script src=""></script>

<div id="app">
</div><svg id="draw" width="600px" height="450px" />


I have contacted the creator of Neataptic library and this is the answer to my question:

... recall that mutations can also change the squashing function of the final node. Yes, the default logistic squashing function can't handle XOR, but a sinusoidal function can!