Saturday 30 April 2011

An idea to combat bloat in genetic programming

Introduction

Genetic programming (Banzhaf et al., 1998) uses an evolutionary algorithm (Eiben, 2003) to search for programs that solve a well-defined problem that can be evaluated using one number, the fitness of the program. Evolutionary algorithms are inspired by the evolution of species by natural selection. The main difference of this search paradigm, compared to methods that are more traditional, is that is works with a group of candidate solution (a population), instead of just one solution. This makes the method more robust, i.e. less likely to get trapped in a local minimum. These algorithms code their solution as a gene. New generations of solutions are formed using sexual (and a-sexual) reproduction. Sexual reproduction (crossover of genes) is emphasized as it allows for the combination of multiple partial solutions into one, thus using the parallel computations made by entire population, making the use of a population less inefficient as it would be otherwise.

Bloat and evolution

One of the fundamental problems of genetic programming (GP) is bloat. After some generations (typically below one hundred), the search for better programs halts as the programs become too large. The main cause of bloat is generally thought to be the proliferations of introns. Introns are parts of the program that do not contribute to the calculation that is made, e.g. a large calculation that is multiplied by zero in the end, or a section that start with: if false then. Additionally, bloat can be caused by inefficient code, e.g. two times x=x+1, instead of x=x+2.

In genetic programming, the linear genes are typically converted into trees or lists with lines of code. Some other coding methods are also used, see Banzhaf et al. (1998). Performing a simple crossover on the genes would almost certainly result in an illegal program. Therefore, crossover is performed on a higher level, e.g. two sub-trees are exchanged between the sets of genes, or sections of program lines are exchanged. This way the child-program is legal code, but still, it is often bad code, and its results are not very similar to its parents. As a consequence, evolution favors bloat at the moment that it becomes hard to find better solutions. In this case, it is better for the parents to get offspring that is at least just as fit as they are, as getting less fit children. The probability to get equally fit offspring is better in large programs (with a small coding part) than in small ones (most of which is useful code), as in large programs it is more likely that the crossover operator does not destroy anything. The problem of bloat is thus intimately linked to the destructive power of crossover.

Advection of the disturbance fields in stochastic parameterisations

A relatively new method to estimate the uncertainties of the weather prediction is the use of ensembles. In this case, the numerical weather prediction (NWP) model is run multiple times with slightly varying settings. A popular choice is to vary the initial conditions (prognostic variables) of the model, within the range of their uncertainty. It seems like this method does not bring enough variability: the ensemble members are still relatively similar to each other and the observation is often still not in the ensemble.

As a consequence, meteorologists are starting to look at uncertainties in the model as an additional source of variability for the ensemble. This can be accomplished by utilising multiple models to create the ensemble (multi-model ensemble), or multiple parameterisation schemes or by varying the parameterisations of one model (stochastic parameterisations). This latter case is discussed in this essay.

Stochastic parameterisations

In parameterisations the effect of subscale processes are estimated based on the resolved prognostic fields. For example, the cloud fraction is estimated based on the relative humidity at the model resolution. As the humidity also varies at scales below the model resolution (sub grid scale variability), it is possible to have clouds even though the average relative humidity is well below saturation. Such functional relations can be estimated by a large set of representative measurements or by modelling with a more detailed model. In deterministic parameterisations the best estimate of, for example, the cloud fraction is used. However, there is normally a considerable spread around this mean cloud fraction for a certain relative humidity. It is thus physically reasonable to consider the parameterised quantity as a stochastic parameter, with a certain probability density function (PDF).

Ideally, one would like to use stochastic parameterisations that were specially developed for this application. Such a parameterisation could also take into account the relation between the PDF and the prognostic model fields. Developing parameterisations is a major task and normally performed by specialists of a certain process. Thus, to get a first idea of the importance of stochastic parameterisations, NWP-modellers started with more ad-hoc approaches. One can, for example, disturb the tendencies calculated by the parameterisations by introducing noise.

Online generation of temporal and spatial fractal red noise

It is relatively easy to generate fields with fractal noise using Fourier, wavelet or cascade algorithms (see also my page on cloud generators). However, these noise fields have to be calculated fully before their first use. This can be problematic in case a 2D or 3D spatial field is needed as input to a dynamical model that should also have a temporal fractal structure. Often the number of temporal time steps is so large, that the noise field become impractically large. Therefore, this essay introduces an online method to generate fractal red noise fields, where the new field is calculated from the previous field without the need to know all previous fields; only the current and the new field have to be stored in memory.
The algorithm involves two steps. The spatially correlated red noise is calculated from a white noise field using Fourier filtering. The white noise field evolves temporally correlated by addition of Gaussian noise. In other words, every pixel of the white noise field represents a fractal Brownian noise time series. Fortunately, the spatially correlated noise field retains the fractal nature of the temporal structure of the white noise field.

On cloud structure

I tried to keep this text understandable for a broad scientific audience. Thus who already know something about fractals, may find the first section on fractals trivial and better start with the second section on why clouds are fractal.

Clouds are fractal

Fractal measures provide an elegant mathematical description of cloud structure. Fractals have the same structure at all scales. That may sound exotic, but fractals are actually very common in nature. An instructive example is a photo of a rock, where you cannot see if the rock is 10 cm large or 10 m, without someone or some tool next to it. Other examples of fractals are commodity prices, the branch structure of plants, mountains, coast lines, your lungs and arteries, and of course rain and clouds.

The fractal structure of a measurement (time series) of Liquid Water Content (LWC) can be seen by zooming in on the time series. If you zoom in by a factor x, the total variance of this smaller part of the time series will be reduced by a factor y. Each time you again zoom in by factor x, you will find a variance reduction by a factor y, at least on average. This fractal behaviour leads to a power law; the total variance is proportional to the scale (total length of the time series) to the power of a constant; to be precise, this constant is log(x)/log(y). Such power laws can also be seen in the measurements of cloud top height, column integrated cloud liquid water (Liquid Water Path, LWP), the sizes of cumulus clouds, the perimeter of cumulus clouds or showers, and in satellite images of clouds and in other radiative cloud properties.

If you plot such a power law in a graph with logarithmic axis, the power laws looks like a line. Thus, a paper on fractals typically shows a lot of so called log-log-plots and linear fits. To identify scaling you need at least 3 orders of magnitude, thus you need large data sets with little noise.