Dynamic load distribution for parallel applications is one of the most
important topics of research in the area of parallel and distributed
systems. The German Science Foundation has been support- ing research in
this area by funding SFB 342 "Methods and Tools for the efficient Use of
Parallel Systems" and GK "Cooperation and Resource-Management in
Distributed Systems" since 1990 and 1995 respectively. This manuscript
is the result of research in both projects done jointly in the
department for informatics and electrical engineering of Technische
Universitlit Munchen as well as by the industrial partner Corporate
Research of Siemens AG, Munich. First, surveys and classifications of
existing models and implementations are given. Based on this
comprehensive framework, several systems for checkpointing, a
distributed thread kernel, load distribution in large workstation
networks, and adaptive load distribution systems have been implemented
and tested. The structure and results obtained with such systems are
also covered. Finally a middleware -based architecture and three
application areas are used to test the systems described beforehand.
Understanding the needs of large distributed applications and the
efficient use of heterogenous networked resources is most important to
make good use of the distributed computing power as it is available
worldwide in the internet and also on a private basis in intranets. This
manuscript will help the reader to understand the questions and can
serve as a basis for future research and development as well as in
training for distributed applications.