Simulation of Learning on Small Graphs  Detailled Project Description

 

The project studies the emergence of modularity in “data processing systems”, like the brain, during an evolutionary process. Primarily, it consists of computer simulations of small networks for which the dynamics mimics “evolutionary learning”.

 

In neural networks, the information is stored in the connectivity (and the synaptical weights attached to these connections) among the components (vertices) of the graph, and the process of “learning” in neural networks is defined to be the process which leads to a configuration which is optimal with respect to the desired tasks of the system. By “evolutionary learning” we mean the random offering of new configurations for the systems and the selection of one of the options according to a measure of “higher fitness”. In several aspects, the features of the investigated networks differ from standard approaches: they include graphs which are highly recurrent, we use non-monotonic transition functions, and, for the first two projects, the “synaptical” weights attached to the lines of the graph cannot be tuned: the line is either there (weight 1) or absent (weight 0).

The graphs consist of a fixed number of input vertices for the presentation of the stimulus (these vertices do not take part in the dynamics), a fixed number of output vertices (where the reaction of the system is read off) and a fixed number of internal vertices. To the vertices of the graph is associated an activity function which for the application to neural systems may be interpreted as an action potential or a “firing-rate”. This activity is updated according to a simple algorithm: the value of the transition function for the sum of the activities of the neighbored vertices of a given vertex determines the value of the activity for this vertex for the next time-step.    

Already these simple systems exhibit a complex learning behavior, like, e.g., the phenomenon of punctuated equilibrium in the evolutionary process or a set of attractors which allows the definition of non-monotonic measures for the information processing during the process of learning. These findings should be of interest for both, the general understanding of evolutionary dynamics and, more specifically, the understanding of the role of recurrence in combination with non-monotonic response patterns for learning processes. The overall interest, however, lies in the investigation of modularity – functional (related to the performance of the system) and “anatomical” (related to subgraphs of the network) – in the learning process as well as in the dynamics of those networks which optimally react to the tasks.

The three current subprojects correspond to an increasing ability of the data processing systems to adapt and react properly to external influences: simple pattern recognition as a “reflex” of the system to a particular predefined set of stimuli, simple categorization as the ability to react to classes of stimuli with predefined properties, and the evolution of individual learning in order to achieve a faster and more efficient adaptation to changes in the environment.

Future projects along this line of increasing capacities will include (1) indirect encoding of connectivities (i.e., the evolutionary process refers to the algorithms which generate connectivity structures and not to the connectivities themselves), (2) the emergence of a “short term memory” in order to enable individual learning, even if the “reward” or “penalty” does not temporally coincide with the system’s reaction, and (3) the emergence of “binary mating” as an efficient means to explore the genetic option space.

 
 

1. Evolutionary Learning of Pattern Cognition

This model simulates the evolutionary emergence of simple reflexes of organisms (or, more general, data processing systems) to a finite and predefined set of input patterns. In our simulations, the input consists of 11 patterns which are presented to the system in a fixed sequential order, each for a certain number of time steps in order to allow the system to respond dynamically. To each of these input patterns corresponds one out of three predefined reaction patterns. The connectivity of the graph is optimized in such a way that a few time steps after the presentation of an image the reaction of the graph at the output vertices corresponds to the predefined action for this image. The learning process stops when the system performs optimally or when there is no improvement for a certain number of random changes.

The simulations have been performed with directed and undirected graphs, where even for the directed graphs a certain percentage (5-20%, depending on the size of the system) of lines remain undirected. In about 1-3% of all runs (again depending on the size of the system) we found “optimal learners”, i.e., the program stops with a graph which reacts optimally with respect to the sequential presentation of the input patterns.

 

Example for the occurence of punctuated equilibrium during the simulation of a learning process. The graphs consisted of 57 internal vertices. The horizontal axis represents the number of generations, the vertical axis the “error function” (an inverse measure for “fitness). After about 17000 generations the sequence produced an optimal learner.

 

The most spectacular finding in this project is the occurrence of “punctuated equilibrium”: the fitness function does not increase gradually but rather in steps, where long periods of almost no improvement are intermitted by periods of rapid changes. Already in this process we observe an elementary form of modularity: the improvement of the systems during the periods of rapid changes is, in general, due to a prominent improvement of the reaction with respect to only a few input patterns and not due to a slight improvement with respect to all input patterns.

 
 

2. Evolutionary Learning of Categorization

A first generalization of the previous model is the simulation of evolutionary emergence of simple “categorization” or “abstraction” abilities, where “simple” refers to the categorization of predefined properties (like the number of activated vertices for the input patterns or the particular shape of an object) and not to the emergence of the general ability of categorization.

In the case of number recognition the procedure is as follows: The input patterns consist of an equal number of images with two and with three activated input vertices. There are only two possible optimal reactions. We compare two runs, where in one case the optimal reaction of the system correlates with the number of activated input vertices and in the other case the reaction is randomized with respect to the number of activated input vertices. For the optimal learners we measure the reactions to previously “unknown” input patterns with two and three vertices. It is expected, that for those systems which learned to recognize the number of vertices there should be a significant correlation between this number and the optimal reaction of the system. However, this should only occur in those systems, for which such a correlation was already part of the learning process.

This project is work in progress and there exist only preliminary results.

 
 

3. Evolutionary Learning of Individual Learning

The step from genetically imprinted reflexes to the ability of individual learning was surely one of the more prominent advances in evolutionary history. Individual learning requires a certain number of prerequisites:

For the organism

  • the possibility to change the synaptical weights within an individuum according to some specified rule, like, e.g., Hebb’s rule, and
  • a “penalty” system, i.e., some mechanism which signals the system that a specific reaction has negative consequences. Depending on this signal the system learns to repeat the reaction or to avoid the reaction the next time the same input pattern is presented. This implies that the “negative” consequences are not strictly “lethal” to the system but can become lethal after a certain number of repetitions.

For the environment

  • a certain number of deterministic rules, i.e., it is advantageous for the organism to learn a specific behavior,
  • this number of deterministic rules supported by the environment should effectively change from generation to generation such that evolutionary learning is not appropriate. “Effectively” could also imply that there is a huge number of fixed deterministic rules but for each generation only a limited number of these becomes relevant.

One might object that at least one further ability of the organism has to be present: a short term memory. In general, the “penalty” or “reward” does not follow the reaction immediately and, therefore, the system has to “remember” the stimulus as well as the reaction until the penalty or reward follows. Only then it can change its connectivities in such a way that the next time the reaction is repeated. For the simplicity of the program, however, we assume that this “penalty” or “reward” follows immediately to the reaction. At a later stage we intend to incorporate the options for an evolutionary learning of a “short term memory”-mechanism. 

The requirements for the environment are implemented by choosing a limited set of input patterns and a limited set of optimal reactions, but the assignments of optimal reactions to input patterns is changed for each new generation. The first requirement for the organism is implemented in such a way that the synaptical weights can change according to a local rule: the change of the synaptical weight of a line depends only on the activities of the incident two vertices. We offer four different rules of this type: the synaptical weight is increased if (1) there is no activity at the incident vertices, (2) there is only activity at the initial incident vertex but none on the final incident vertex, (3) there is only activity at the final incident vertex but none on the initial incident vertex, and (4) there is activity at both incident vertices (this corresponds to Hebb’s rule). Furthermore, the application of this rule should depend on the signal of the “penalty system”: if this signal is “on” the rule should not be applied, otherwise it should be applied.

The evolutionary learning of individual learning consists of two steps: (1) by an evolutionary learning process (similar to the one described in the first two subprojects) the systems learn that “signal on” is negative, and (2), again by an evolutionary process, the systems learn to apply that rule of synaptical change which leads to an improvement of the reaction, however, only if the penalty-signal is “off”.

Again this is work in progress.