Title of project: | Communications and mapping in parallel computers |
Major field of science: | Technical Science |
Research area: | Information and Communication Technologies |
Duration of project: | 3 years (1996-98) |
Total work load in hours: | 11.313 |
Principal investigator: | Dr. Roman Trobec |
Project summary: | In the project, different types of communications among co-operating processors, and processes mapping on parallel computers will be investigated; in a general case, some faulty processors will also be considered. Our results are intended to be implemented in software for the future parallel operating systems as well as in custom designed VLSI circuits. To confirm research results, the theory will be verified with some important computationally intensive applications. New methods will be contributed to parallel integration in molecular dynamics, solutions of difference equations, and parallel parsing. |
Recent advances in technology have enabled the construction of computers whose architectures no longer conform to the traditional von Neumann model of computation. Parallelism is the dominating feature these machines have in common. Parallel computing is one of the most rapidly growing research fields in computer science and engineering. The earliest parallel machines were SIMD computers and vector computers which generally relied on internal parallelism and pipelining to improve data throughput. In the next generation, several independently operating processors were connected in order to co-operate on the solution of a common problem (MIMD). Those multiprocessors generally provided the programmer with shared memory, enabling code to be written in a similar style as for conventional von Neumann computers. However, architectural complexity imposes a limit on the size of computers being built on a pure shared memory basis, and more attention was paid to the design and construction of multiprocessor systems with memory distributed among independent processors (Distributed Memory Multiprocessors - DMMP). Among the first commercial systems that were built as DMMP were the Intel iPSC hypercube series, the Floating Point Systems T-Series and a number of Inmos transputer based systems. Recently, well known supercomputer companies such as Cray Research and Convex have been testing the parallel machines CRAY T3D and CONVEX Exemplar. Besides, workstation networks have become a suitable platform for parallel processing. The concept of such systems offers virtually unlimited design flexibility and scalability. We hope to be able to build computers with orders of magnitude more computing power than we have now. Moreover, those computers would provide high performance at a comparatively low cost. One of the most fundamental challenges computer science is facing with today is need for developing algorithms, programming tools, and applications to harness the vast computing power of parallel computers; moreover, a theoretical basis for this new technology has to be improved. Since it is expected that parallel computers will gain an ever increasing share of the market, these developments will have important commercial consequences. The basic project goals are the following. First, to widen the knowledge of parallel computing area, particularly communications and mapping, and systems with faulty nodes. Second, some important computationally intensive applications will be implemented in order to test our research results. Third, the project will establish minimal scientific potential capable to make researches into future computer systems.
Parallel computer systems represent a promising tool for increasing computational power of computers, hence the theoretical basis for parallel computation has to be matured. The proposed research project is concentrated on the following topics:
Mapping, Communications and Software Tools in Parallel Systems. Assigning processes to processors, data distribution, and communication among the processors is fundamental for all applications running on parallel computers. Efficient execution of parallel programs on an existing parallel computer requires proper mapping of programs onto the underlying machine architecture which can either be general-purpose (Bokhari, 1981; Robic et al., 1992) or special-purpose (Ozimek and Tasic, 1995). A practical model of a parallel computer has at each of its nodes a fixed size buffer allowing only limited link contention during message routing. Broadcasting (one-to-all), multicasting (one-to-more) and gossiping (all-to-all) are communication problems arising in distributed memory multicomputers (Hedetniemi et al., 1986). The need for selective broadcasting arises in many important applications, yet there is still no appropriate mechanism for such broadcasting currently available in parallel computers. Study of fault tolerant communication algorithms is also a very important topic of parallel computing (Chen and Shing, 1990; Robic and Silc, 1994). The main reason for delayed practical usage of parallel computers is the lack of software for developing parallel programs. Functional programming languages are among the most promising platforms for parallelisation, because they are not inherently sequential as conventional imperative language are (Peyton Jones, 1987). A tool for designing platform-independent parallel programs is, for example, PCL - parallel computation library (Jerebic et al., 1992). It enables writing portable programs for mesh and torus networks of processors, and developing programs on sequential computers.
Failures in Parallel Systems. A popular approach for handling faults in parallel systems is to restore regularity of a system, deteriorated by failures. Many authors have discussed hardware redundancy with spare units and switches combined with internal coding of computational data. These methods can be used for improving reliability, since they also cover intermittent faults. However, they cannot be used for dealing with fault localisation. Some authors have investigated multi-level diagnosis and configuration in the presence of faults. In papers dealing with distributed diagnosability, the assumption is made that each fault-free unit is capable of determining the diagnostic state of all other units in the system (Somani and Agarwal, 1992). In the case of massively parallel systems, this approach is not feasible due to a large number of components in the system and the amount of communication required. Also, if the number of faults is greater than some upper bound that depends on connectivity, a false diagnosis can be obtained. These facts represent a serious drawback for reusability of earlier developed methods in the area of parallel system-level diagnosability. In several papers, it has been shown that a local procedure could be used in massively parallel systems. Some recent works (e.g. Blough and Pelc, 1993) on the diagnosis of regular constant-degree systems have been done using probability models. Promising results for the strategy that uses only local information for system diagnosis have been reported. Our work (Trobec and Jerebic, 1994) is based on the assumption that random production and run-time failures are present. The run-time diagnostic is local and organised as a parallel procedure.
Parallel Numerical and Non-numerical Applications. Parallel Numerical Linear Algebra is an important tool for solving computationally intensive problems arising in various scientific areas. Talking about the development of parallel algorithms, we have in mind the design of scaleable routines that can be included into existing libraries and run on various High Performance Computing platforms. Recently, parallel implementations have been directed by the development of scaleable and platform-independent libraries based on Basic Linear Algebra Subroutines (BLAS) (Choi et al., 1994) and introduction of message-passing standard. This enables the design of parallel applications not only on supercomputers but also on clusters of workstations. Our main research interest has been in the area of parallel algorithms for solving large sparse systems of linear equations with the emphasis on the system structure (Evans and Caf, 1994) and use of updating techniques (Blaznik et al., 1995). Parallel Molecular Dynamics Integration. Demands for the extension of computer simulation methods to molecular systems of increasing size and complexity cannot be met solely by developing better hardware and consuming more computer time; since the needs for conceptual improvements of existing algorithms are becoming more and more obvious. Enlargement of the scope of existing methods of molecular dynamics of complex molecular systems and their parallel implementations on the ring topology for distributed memory computers is presented in (Trobec and Janezic, 1995). Parallel Parsing: The membership problem for formal languages is one of the basic problems in the theory of computation. It is well solved for sequential computers, but no significant progress has been made in the field of parallel computers yet. Syntax analysis of context-free languages represents an important subproblem because syntax analysers or parsers form the skeleton of the majority of compilers and other program processing tools (Jain and Chuadhari, 1994). The syntax analysis should be split to a number of independent parsing processes, which should be carried out by embedded LR(k) parsers. An algorithm for splitting LR(k) parsing tables and generating tables for embedded LR(k) parsers is still to be found. Parallel Heat Transfer Simulation: Computer simulation and other scientific applications that have until recently been beyond the capability of conventional computers are becoming reality on parallel computers. These applications are an important tool in searching for new phenomena in nature, worst-case predictions, investigations of dangerous situations, etc. In a parallel computer, a physical domain is evenly partitioned among $p$ nodes. The same program in each node solves its own subdomain in parallel, some data from domain boundaries are exchanged and the solution is ideally reached $p$ times faster than on a single processor (Fox et al., 1988). The above description follows the nature that is in foundation parallel. We have developed a simple parallel computer simulation method that is reliable enough for practical use in medicine (Trobec and Slivnik, 1994). We have analysed a heat distribution in a human heart during a cardiac surgery. The human heart is an irregularly shaped three dimensional object. Any real measurements and experiments are dangerous to be performed on humans, therefore, the computer simulation results will be of crucial importance in investigating and improving practical methods in medicine.
P.Blaznik, D.J.Evans, J.Tasic.
Parallel updating techniques in image restoration.
Parallel Algorith. Appl. 9 (1995)
D.M.Blough, A.Pelc.
Diagnosis and repair in multiprocessor systems.
IEEE Trans. Comput. 42 (1993)
S.H.Bokhari.
On the mapping problem.
IEEE Trans. Comput. 30 (1981)
M.Chen, K.Shing.
Depth-first search approach for fault-tolerant routing in
hypercube multicomputers.
IEEE Trans. Parallel Distributed Syst. 1 (1990)
J.Choi, J.J.Dongarra, D.W.Walker.
PB-BLAS: A set of parallel block basic linear algebra subprograms.
Proc. Scalable High-Performance Comput. Conf.
(IEEE, Los Alamitos, CA, 1994)
D.J.Evans, D.Caf.
A sparse preconditioner for symmetric positive definite banded
circulant and Toeplitz linear systems.
Inter. J. Computer Math. 54 (1994)
G.C.Fox et al.
Solving Problems on Concurrent Processors.
(Prentice-Hall, 1988)
S.M.Hedetniemi, T.Hedetniemi, A.L.Liestman.
A survey of gossiping and broadcasting in communication networks.
Networks 18 (1986)
A.Jain, N.S.Chaudhari.
Efficient parallel recognition of contex-free languages.
Parallel Comput. 20 (1994)
I.Jerebic, B.Slivnik, R.Trobec.
Library for programming on torus-connected multiprocessors.
M.Valero et al. (Eds.) PACTA 92 (IOS Press, 1992)
I.Ozimek, J.Tasic.
New approach to parallel systolic algorithm implementations -
RLSL example.
Signal Processing 42 (1995)
S.L.Peyton Jones.
The Implementation of Functional Programming Languages.
(Prentice-Hall, 1987)
B.Robic, P.Kolbezen, and J.Silc.
Area optimization of dataflow-graph mappings.
Parallel Comput. 18 (1992)
B.Robic, J.Silc.
Fault tolerant mapping onto VLSI/WSI processor arrays.
Proc. EUROMICRO 94 (IEEE Computer Society Press, 1994)
A.K.Somani, V.K.Agarwal.
Distributed diagnosis algorithm for regular interconnected structures.
IEEE Trans. Comput. 41 (1992)
R.Trobec, D.Janezic.
Comparison of parallel Verlet and implicit Runge-Kutta methods for
molecular dynamics integration.
J. Chem. Inf. Comput. Sc. 35 (1995)
R.Trobec, I.Jerebic.
Optimization of diagnostic examination.
Lect. Notes Comput. Sc. 854 (1994)
R.Trobec, B.Slivnik.
Parallel heat transfer computation on generally shaped bodies.
Proc. Int'l Workshop Parallel Numerics '94 (1994)
One of the most fundamental challenges
computer science is facing with today is need for developing algorithms, programming
tools, and applications to harness the vast computing power of parallel
computers; moreover, a theoretical basis for this new technology has to be
improved. Since it is expected that parallel computers will gain an ever
increasing share of the market, these developments will have important
commercial consequences.
Research in the field of concurrent computer systems is a great challenge,
especially for the following reasons:
1.) There is constant and strong human desire to posses even stronger computer
systems. This can be met only by parallel architectures, where the set of
subtasks is carried out by independent processors communicating by message
passing.
2.) Computer systems consist of enormous number of electronic components and
malfunction of only one of them may result in a total system crash.
Since electronic components will never be totally reliable, the possibility
of failures must always be considered. Taking this fact into account,
a computer system should be developed, capable of self-diagnosis and
fault-tolerance.
As claimed by many computer scientists dealing with distributed computation,
interconnection of more and more processors will be effective only if
the implemented mapping and message routing algorithms possess fault
tolerance properties. Having this in mind, the problems of process-to-processor
mapping, efficient selective broadcasting methods, and message routing with
limited link contention are of great theoretical and practical significance.
To implement a high speed custom design application in the area
of digital communications,
a parallel algorithm must be implemented in silicon as a VLSI circuit.
To simplify the development of such special machines,
it is important to use software tools for automatic
mapping of mathematical (computing) algorithms to the structure.
For a given algorithm (described as a set of equations), the proposed
tool will produce a description of its parallel fine grain
implementation. This description is suitable for further processing
with integrated circuit development packages.
We expect that the project will contribute to the theoretical foundation
of parallel computing especially in the areas of communication and
algorithm mapping, even in the presence of faulty units.
We will promptly verify the theory with several
computationally intensive applications (molecular dynamics, difference equations,
and parallel parsing) in order to confirm our results.
Our project will contribute new results to theoretical foundations of parallel computing. Research will be focused on communications and algorithm mapping. New theoretical results are also expected in handling faults and increasing reliability of parallel systems. Theory will be verified with some computationally intensive applications. With new approaches to parallel computing, our recent results in molecular dynamics, solutions of partial differential equations and parallel syntax analysis will be improved. In the framework of the project, we expect a research group capable of studying and using future computer systems will be formed.
In practice, we expect faster implementations of computationally intensive applications and simulation of natural phenomena that could not be performed experimentally. By increasing the number of processors, the computation time will decrease. In molecular dynamics, only short time intervals of order several femtoseconds can be simulated today. The time interval could be widen by using parallel computers, thus researchers would be able to study matter in more detail. Another example is simulation of heart cooling process during the surgery. Often, to perform in-vivo experiments is dangerous or impossible, while simulation can give us the insight into physical behaviour of the cooling process without any harm.