Conventional applications of neural networks usually predict a single
value as a function of given inputs. In forecasting, for example, a
standard objective is to predict the future value of some entity of
interest on the basis of a time series of past measurements or
observations. Typical training schemes aim to minimise the sum of
squared deviations between predicted and actual values (the 'targets'),
by which, ideally, the network learns the conditional mean of the target
given the input. If the underlying conditional distribution is Gaus-
sian or at least unimodal, this may be a satisfactory approach. However,
for a multimodal distribution, the conditional mean does not capture the
relevant features of the system, and the prediction performance will, in
general, be very poor. This calls for a more powerful and sophisticated
model, which can learn the whole conditional probability distribution.
Chapter 1 demonstrates that even for a deterministic system and 'be-
nign' Gaussian observational noise, the conditional distribution of a
future observation, conditional on a set of past observations, can
become strongly skewed and multimodal. In Chapter 2, a general neural
network structure for modelling conditional probability densities is
derived, and it is shown that a universal approximator for this extended
task requires at least two hidden layers. A training scheme is developed
from a maximum likelihood approach in Chapter 3, and the performance
ofthis method is demonstrated on three stochastic time series in
chapters 4 and 5.