Parallel &
Distributed Software Architectures
Various hardware and software architectures exist
that are usually used for distributed computing. At a lower level, it is
necessary to interconnect multiple CPUs with some sort of network, regardless
of that network being printed onto a circuit board or made up of several
loosely-coupled devices and cables. At a higher level, it is necessary to
interconnect processes running on those CPUs with some sort of communication
system and execution environment. We look at some of the main approaches to the
design of parallel and distributed software architectures.
Client-Server Computing
Client-Server
Computing is a model of software construction and process interaction which
allows for the separation, decomposition and potential distribution of system
and application functionality. A server is an autonomous process which executes
and fulfils requests which are generated and sent to it by client processes.
A
process fills the role of a server in a system if it behaves as follows:
1. Provides services through an interface. An interface is a
well defined protocol describing the mechanism by which service functions may
be known and acquired and the specification of expected parameter types and
returned results.
2. Responds to requests by returning expected results to the
client.
3. Hides implementation and complexity of service functions.
A
process fills the role of client in a system if it behaves as follows:
1. Presents a standard interface to the
server and the user.
2. Forms queries by translating user
input to server operations.
3. Initiates communication with the
server.
4. Performs data analysis on results and
displays them to the user.
With
this architecture a traditional large centralised application can be downsized
by dividing its functionality into separate client and server processes which
can be located on different hosts. Server processes can be assigned to high
performance processing hosts with large non-volatile storage availability and
can offer concurrent request processing to networked clients. Client processes
can run at a user's locality availing of access to local display hardware for
presenting results and to reduce server load by dealing with user interactivity
and input devices.
Multi-tier
Architectures
A generic Client/Server architecture has two types
of nodes on the network: clients and servers. As a result, these generic
architectures are sometimes referred to as "two-tier" architectures.
Some networks will consist of three different
kinds of nodes, clients, application servers which process data for the
clients, and database servers which store data for the application servers.
This is called a three-tier architecture.
In general, an n-tier or multi-tier architecture
may deploy any number of distinct services, including transitive relations
between application servers implementing different functions of business logic,
each of which may or may not employ a distinct or shared database system.
The advantage of an n-tier architecture compared with
a two-tier architecture (or a three-tier with a two-tier) is that it separates
out the processing that occurs to better balance the load on the different
servers; it is more scalable. The disadvantages of n-tier architectures are
that it puts a greater load on the network and it is much more difficult to
program and test software than in two-tier architecture because more devices
have to communicate to complete a user’s transaction.
Cluster
A computer cluster is a group of loosely
coupled computers that work together closely so that in many respects it can be
viewed as though it were a single computer. Clusters are commonly, but not
always, connected through fast local area networks. Clusters are usually
deployed to improve speed and/or reliability over that provided by a single
computer, while typically being much more cost-effective than single computers
of comparable speed or reliability.
High-availability (HA) clusters
High-availability clusters are implemented
primarily for the purpose of improving the availability of services which the
cluster provides. They operate by having redundant nodes, which are then used
to provide service when system components fail. The most common size for an HA
cluster is two nodes, which is the minimum required to provide redundancy. HA
cluster implementations attempt to manage the redundancy inherent in a cluster
to eliminate single points of failure. There are many commercial
implementations of High-Availability clusters for many operating systems. The
Linux-HA project is one commonly used free software HA package for the Linux
OS.
Load balancing clusters
Load balancing clusters operate by having all
workload come through one or more load-balancing front ends, which then
distribute it to a collection of back end servers. Although they are
implemented primarily for improved performance, they commonly include
high-availability features as well. Such a cluster of computers is sometimes
referred to as a server farm. Linux Virtual Server (LVS) is an
advanced load balancing solution for Linux systems.There are many commercial
load balancers available including Platform LSF HPC, Moab Cluster Suite.
High-performance (HPC) clusters
High-performance clusters are implemented
primarily to provide increased performance by splitting a computational task
across many different nodes in the cluster, and are most commonly used in
scientific computing. One of the more popular HPC implementations is a cluster
with nodes running Linux as the OS and free software to implement the parallelism.
This configuration is often referred to as a Beowulf cluster. Such clusters
commonly run custom programs which have been designed to exploit the
parallelism available on HPC clusters. Many such programs use libraries such as
MPI which are specially designed for writing scientific applications for HPC
computers.
HPC clusters are optimized for workloads where
jobs or processes running on the cluster nodes communicate actively during the
computation.
Grid Architecture
Grid computing or grid clusters are a technology
closely related to cluster computing. The key differences between grids and
traditional clusters are that grids connect collections of computers which do
not fully trust each other, and hence operate more like a computing utility
than like a single computer. In addition, grids typically support more
heterogeneous collections of nodes than are commonly supported in clusters.
Grid computing is optimized for workloads which
consist of many independent jobs or packets of work, which do not have to share
data between the jobs during the computation process. Grids serve to manage the
allocation of jobs to computers which will perform the work independently of
the rest of the grid cluster. Resources such as storage may be shared by all
the nodes, but intermediate results of one job do not affect other jobs in
progress on other nodes of the grid.
Peer
to Peer Architectures
A peer-to-peer computer network is a
network that relies on the computing power and bandwidth of the participants in
the network rather than concentrating it in a relatively low number of servers.
Such networks are useful for many purposes. Sharing content files containing
audio, video, data or anything in digital format is very common, and realtime
data, such as telephony traffic. A pure peer-to-peer network does not have the
notion of clients or servers, but only equal peer nodes that
simultaneously function as both "clients" and "servers" to
the other nodes on the network. This model of network arrangement differs from
the client-server model where communication is usually to and from a central
server.
Mobile
Code
Mobile code
includes software obtained from remote systems, transferred across a network,
and then downloaded and executed on a local system without explicit
installation or execution by the recipient. Examples of mobile code include
scripts (JavaScript, VBScript), Java applets, ActiveX controls, and macros
embedded within Office documents.
The term mobile agent refers to a process
that can transport its state from one environment to another, with its data
intact, and still be able to perform appropriately in the new environment. Each
node must provide a mobile agent with a standard interface or operating
environment through which it can carry out its work. An agent can be moved to
where data is located rather than shipping the data to a remote location.