Parallel Computing
Description:
Exploration of parallelism is far from just a question of scientific significance. Even the building of ancient pyramids involved concurrent cooperations, work load balancing, pipelining and resource scheduling, which all fell into the pocket of giant parallelism. This is yet another evidence that computer has fallen behind human intelligence when carrying out computation tasks one by one devotionally. However proud we may be about this, if any, computers are catching up, especially when pushed by a large number of scientific scholars who are more and more discontented for long program execution time and distribued data intensive computations.
Parallel computing is the savior for those ancious scientists: it provides:
- enormous computing power,
- very large scale distributed data warehouses support,
- high performance and
- efficiency.
It does these by multithread of control and by sharing of both heterogeneous and homogeneous resources. There are mainly three aspects of parallel computing:
- algorithms and application;
- programming methods, languages and environments;
- parallel machines and architectures.
Enablers:
- High-end graphics is very expensive, use standard (inexpensive) components, similar idea as cluster computing, but applied to graphics
- Many applications need much faster machines
- Sequential machines are reaching their speed limits
- Use multiple processors to solve large problems fast
- Microprocessors are getting cheaper and cheaper
- Analyzing video images
- Aircraft modeling
-Ozone layer modeling
-Climate modeling
-Ocean circulation
-Quantum chemistry
-General: computational science
-Computer chess
-Protein folding
-Sequence alignment
-Grid computing
Inhibitors:
- Interprocess communication bottleneck, especially with low bandwidth and high latency
- Lack of application level tools, including programming models and execution environments
- The hierarchical nature of program structure limits its application over common flat platforms
- Productivity gap between currently available hardware platforms and programing paradigms
- The need for high performance and data-intensive computations may have little to do with home PCs but rather large institutions which need data and computation resouce sharing and coorperation.
- Lack of efficient scheduling algorithms and automatic parallelism.
Paradigms:
The future of parallel computing is very prosperous, expecially after the emergence of Grid technology, which provides a middleware layer on top of which parallel programs can run and communicate with each other directly and transparently. It won't be long that someday everyone could rely solely on his and his friends' I386 machine (if still any) to do all kinds of complex computation within seconds, without knowing a whole world working, cooperating and communicating behind scence.
Here are some paradigms concerning future parallel computing:
- Parallel machine organization:
- Processor array
- Shared memory multiprocessors
- Distributed memory multiprocessors
- Processor array
- Flynn's taxonomy:
- SISD: Single Instruction Single Data Traditional uniprocessors
- SIMD: Single Instruction Multiple Data Processor arrays
- MISD: Multiple Instruction Single Data Nonexistent?
- MIMD: Multiple Instruction Multiple Data Multiprocessors and multicomputers
- SISD: Single Instruction Single Data Traditional uniprocessors
- General phases in designing and building parallel programs:
- Partitioning
- Communication
- Agglomeration
- Mapping
- Partitioning
- New distributed applications that use data or instruments across multiple administrative domains and that need much CPU power:
- Computer-enhanced instruments
- Collaborative engineering
- Browsing of remote datasets
- Use of remote software
- Data-intensive computing
- Very large-scale simulation
- Large-scale parameter studies
- Computer-enhanced instruments
- Grid programming models:
- RPC(Remote procedure call)
- Task parallelism
- Message passing
- Java programming
- RPC(Remote procedure call)
- Grid application execution environments:
- Parameter sweeps
- Workflow
- Portals
- Parameter sweeps
Experts:
Professor Henri Bal, Vrije Unversiteit Amsterdam, http://www.cs.vu.nl/~bal
Dr. Thilo Kielmann, Vrije Universiteit Amsterdam, http://www.cs.vu.nl/~kielmann/
Professor Ian Foster, University of Chicago, http://www-fp.mcs.anl.gov/~foster/
Dutch grid, http://www.dutchgrid.nl
Gridlab, http://www.gridlab.org
Globus, http://www.globus.org
Global Grid Forum, http://www.ggf.org
UCSB, CMU, UC Berkeley, Monash Unversity, Cambridge Unversity, Stanford University
Timing:
1956,IBM starts Stretch project with the goal of producing a machine with 100 times the performance of the IBM 704,initiated by Atomic Energy Commission at Los Alamos.
1959,IBM delivers first Stretch computer; less than 10 are ever built.
1964,Control Data Corporation produces CDC~6600,the world's first commercial supercomputer,Atomic Energy Commission urges manufacturers to look at ``radical machine structures.This leads to CDC Star-100, TI ASC, and Illiac-IV.
1966,Bernstein introduces Bernstein's Condition for statement independence,which is foundation of subsequent work on data dependency analysis.Flynn publishes paper describing architectural taxonomy.
1967,Amdahl publishes paper questioning feasibility of parallel processing;his argument is later called ``Amdahl's Law.
1968,Cyberplus Group formed at Control Data to study computing needs for image processing; this leads to AFP and Cyberplus designs.
1969,CDC produces CDC~7600 pipelined supercomputer.
1970, Floating Point Systems Inc. founded by former C~N~Winningstad and Tektronix employees to manufacturer floating-point co-processors for minicomputers.
Asymmetric multiprocessor jointly developed by MIT and DEC
1971, CDC delivers hardwired Cyberplus parallel radar image processing system to Rome Air Development Center,where it produces 250 times the performance of CDC~6600
1971, Edsger Dijkstra poses the dining philisophers problem which is often used to test the expressivity of new parallel languages.
1972,Paper studies of massive bit-level parallelism done by Stewart Reddaway at ICL.These later lead to development of ICL DAP.
Asymmetric multiprocessor operating system TOPS-10 developed by DEC for PDP-10 minicomputers.
1976,Control Data delivers Flexible Processor,a programmable signal processing unit. Floating Point Systems Inc.\ delivers 38-bit AP-120B array processor that issues multiple pipelined instructions every cycle
1977,C.mmp hardware completed at Carnegie-Mellon University crossbar connecting minicomputers to memories
Massively Parallel Processor project first discussed at NASA for fast image processing.
1979, ICL DAP delivered to Queen Mary College, London ---world's first commercial massively parallel computer
Parviz Kermani and Leonard Kleinrock describe the virtual cut-through technique for message routing.
1980,First generation DAP computers delivered by ICL
1981,Floating Point Systems Inc. delivers 64-bit FPS-164 array processor that issues multiple pipelined instructions every cycle,start of mini-supercomputer market
First BBN Butterfly delivered ---68000s connected through multistage network to disjoint memories,giving appearance of shared memory.
1983,DARPA starts Strategic Computing Initiative, which helps fund such machines as Thinking Machines Connection Machine
Massively Parallel Processor delivered by Goodyear Aerospace to NASA Goddard
1985, David Jefferson describes how virtual time and time warping can be used as a basis for speculative distributed simulations
1986 Sequent produces first shared-memory Balance multiprocessors,using NS32032 microprocessors and proprietary DYNIX symmetric operating system.
Active Memory Technology spun off from ICL to develop DAP products.
1988,Sequent produces 80386-based Symmetry bus-based multiprocessor.
ParaSoft releases first commercial version of Express MPI; first version of DIME (Distributed Irregular Mesh Environment
Tera Computer Co.\ founded by Burton Smith and James Rottsolk to develop and market a new multi-threaded parallel computer
1990 the first Japanese parallel vector supercomputer up to 4 processors, each with up to 4 pipeline sets,a 2.9 ns clock,and up to 4 Gbyte of memory
Applied Parallel Resarch (APR) spun off from Pacific-Sierra Research (PSR) to develop FORGE and MIMDizer parallelization tools, and upgrade them to handle Fortran~90
1991, CRI produces first Y/MP~C90. 1991, Kendall Square Research starts to deliver 32-processor KSR-1 computer systems. 1991, Abhiram Ranade describes how message combining, butterfly networks, and a complicated routing algorithm can emulate PRAMs in near-optimal time. 1991, Thinking Machines Corporation produces CM-200 Connection Machine, an upgraded CM-2.MIMD CM-5 announced.
1992,Chandy and Taylor describe PCN, a parallel programming system similar to Strand~88,based on dataflow and logic programming. 1992, Thinking Machines Corporation produces first CM-5, containing up to 1024 Sparc microprocessors connected in a fat tree topology, each with up to 4 vector units manufactured by Texas Instruments. A RAID system for the CM-5 is also announced.
1993,512-node J-Machines, message-driven multicomputers,operational at MIT and Caltech. 1993,NEC produces Cenju-3,containing up to 256 VR4400SC (MIPS R4000 runalike) processors connected by an Omega network.