Multiple processor systems are an important class of parallel systems.
Over the years, several architectures have been proposed to build such
systems to satisfy the requirements of high performance computing. These
architectures span a wide variety of system types. At the low end of the
spectrum, we can build a small, shared-memory parallel system with tens
of processors. These systems typically use a bus to interconnect the
processors and memory. Such systems, for example, are becoming
commonplace in high-performance graph- ics workstations. These systems
are called uniform memory access (UMA) multiprocessors because they
provide uniform access of memory to all pro- cessors. These systems
provide a single address space, which is preferred by programmers. This
architecture, however, cannot be extended even to medium systems with
hundreds of processors due to bus bandwidth limitations. To scale
systems to medium range i. e., to hundreds of processors, non-bus
interconnection networks have been proposed. These systems, for example,
use a multistage dynamic interconnection network. Such systems also
provide global, shared memory like the UMA systems. However, they
introduce local and remote memories, which lead to non-uniform memory
access (NUMA) architecture. Distributed-memory architecture is used for
systems with thousands of pro- cessors. These systems differ from the
shared-memory architectures in that there is no globally accessible
shared memory. Instead, they use message pass- ing to facilitate
communication among the processors. As a result, they do not provide
single address space.