Difference between revisions of "Parallel Computing"

From ScenarioThinking
Jump to navigation Jump to search
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Description:==
==Description:==
Exploration of parallelism is far from just a question of scientific significance. Even the building of ancient pyramids involved concurrent cooperations, work load balancing, pipelining and resource scheduling, which all fell into the pocket of giant parallelism. This is yet another evidence that computer has fallen behind human intelligence when carrying out computation tasks one by one devotionally. However proud we may be about this, if any, computers are catching up, especially when pushed by a large number of scientific scholars who are more and more discontented for long program execution time and distribued data intensive computations. Parallel computing is the savior for those ancious scientists: it provides enormous computing power, very large scale distributed data warehouses support, high performance and efficiency. It does these by multithread of control and by sharing of both heterogeneous and homogeneous resources. There are mainly three aspects of parallel computing: algorithms and application; programming methods, languages and environments; parallel machines and architectures. The future of parallel computing is very prosperous, expecially after the emergence of Grid technology, which provides a middleware layer on top of which parallel programs can run and communicate with each other directly and transparently. However, there are still some problems remaining to be solved, such as efficient scheduling algorithms and automatic parallelism, etc. Hope that one day everyone could rely solely on his and his friends' I386 machine (if still any) to do all kinds of complex computation within seconds, without knowing a whole world working, cooperating and communicating behind scence.
Exploration of parallelism is far from just a question of scientific significance. Even the building of ancient pyramids involved concurrent cooperations, work load balancing, pipelining and resource scheduling, which all fell into the pocket of giant parallelism. This is yet another evidence that computer has fallen behind human intelligence when carrying out computation tasks one by one devotionally. However proud we may be about this, if any, computers are catching up, especially when pushed by a large number of scientific scholars who are more and more discontented for long program execution time and distribued data intensive computations.  
''Parallel computing'' is the savior for those ancious scientists: it provides:<br>
*enormous computing power,  
*very large scale distributed data warehouses support,  
*high performance and  
*efficiency. <br>
It does these by multithread of control and by sharing of both heterogeneous and homogeneous resources. There are mainly '''three aspects''' of parallel computing:  
#algorithms and application;  
#programming methods, languages and environments;  
#parallel machines and architectures. <br>


==Enablers:==
==Enablers:==
Line 21: Line 30:


==Inhibitors:==
==Inhibitors:==
-      Interprocess communication bottleneck, especially with low bandwidth and high latency </br>
*Interprocess communication bottleneck, especially with low bandwidth and high latency </br>
- Lack of application level tools, including programming models and execution environments </br>
*Lack of application level tools, including programming models and execution environments </br>
-      The hierarchical nature of program structure limits its application over common flat platforms </br>
*The hierarchical nature of program structure limits its application over common flat platforms </br>
-      Productivity gap between currently available hardware platforms and programing paradigms </br>
*Productivity gap between currently available hardware platforms and programing paradigms </br>
-      The need for high performance and data-intensive computations may have little to do with home PCs but rather large institutions which need data and computation resouce sharing and coorperation. </br>
*The need for high performance and data-intensive computations may have little to do with home PCs but rather large institutions which need data and computation resouce sharing and coorperation. </br>
*Lack of efficient scheduling algorithms and automatic parallelism.


==Paradigms:==
==Paradigms:==
The future of parallel computing is very prosperous, expecially after the emergence of Grid technology, which provides a middleware layer on top of which parallel programs can run and communicate with each other directly and transparently. It won't be long that someday everyone could rely solely on his and his friends' I386 machine (if still any) to do all kinds of complex computation within seconds, without knowing a whole world working, cooperating and communicating behind scence.<br>
Here are some paradigms concerning future parallel computing:
*Parallel machine organization: </br>
*Parallel machine organization: </br>
**Processor array </br>
**Processor array </br>
Line 96: Line 109:


1971, Edsger Dijkstra poses the dining philisophers problem which is often used to test the expressivity of new parallel languages.<br>
1971, Edsger Dijkstra poses the dining philisophers problem which is often used to test the expressivity of new parallel languages.<br>


1972,Paper studies of massive bit-level parallelism done by Stewart Reddaway at ICL.These later lead to development of ICL DAP. <br>
1972,Paper studies of massive bit-level parallelism done by Stewart Reddaway at ICL.These later lead to development of ICL DAP. <br>
Asymmetric multiprocessor operating system TOPS-10 developed by DEC for PDP-10 minicomputers. <br>
Asymmetric multiprocessor operating system TOPS-10 developed by DEC for PDP-10 minicomputers. <br>


1976,Control Data delivers Flexible Processor,a programmable signal processing unit.
1976,Control Data delivers Flexible Processor,a programmable signal processing unit.
Line 115: Line 126:
1981,Floating Point Systems Inc. delivers 64-bit FPS-164 array processor that issues multiple pipelined instructions every cycle,start of mini-supercomputer market <br>
1981,Floating Point Systems Inc. delivers 64-bit FPS-164 array processor that issues multiple pipelined instructions every cycle,start of mini-supercomputer market <br>
First BBN Butterfly delivered ---68000s connected through multistage network to disjoint memories,giving appearance of shared memory.<br>
First BBN Butterfly delivered ---68000s connected through multistage network to disjoint memories,giving appearance of shared memory.<br>


1983,DARPA starts Strategic Computing Initiative, which helps fund such machines as Thinking Machines Connection Machine <br>
1983,DARPA starts Strategic Computing Initiative, which helps fund such machines as Thinking Machines Connection Machine <br>
Massively Parallel Processor delivered by Goodyear Aerospace to NASA Goddard <br>
Massively Parallel Processor delivered by Goodyear Aerospace to NASA Goddard <br>


1985, David Jefferson describes how virtual time and time warping can be used as a basis for speculative distributed simulations
1985, David Jefferson describes how virtual time and time warping can be used as a basis for speculative distributed simulations
\event{1985}{{Pfister}, {Norton}, {hot spots}, {message combining}}{GVW}
{Pfister and Norton analyse effect of hot spots
in multistage networks,
and describe how message combining can ameliorate their effect.}
\event{1985}{{TMC}, {CM-1}}{BMB}
{TMC demonstrates first CM-1 Connection Machine to DARPA.}
\event{1985}{{Dally}, {Seitz}, {wormhole routing}, {virtual channels}}{WD}
{Dally and Seitz develop model of wormhole routing,
invent virtual channels, and show how to perform
deadlock-free routing using virtual channels.}
\typeout{Relevance of Stellar/Ardent?}
\typeout{Why not mention SGI? (or why mention any graphics workstations?)}
\typeout{What about Ametek, or Symult?}
\typeout{American Supercomputers, Inc., for a short time, comes in here.}
\typeout{What did/does Supertek do?}
\yr{1986}
\event{1986}{{Sequent}, {Balance}}{GVW}
{Sequent produces first shared-memory Balance multiprocessors,
using NS32032 microprocessors
and proprietary DYNIX symmetric operating system.}
\event{1986}{{SCS}}{GVW}
{Scientific Computer Systems delivers first SCS-40,
a Cray-compatible minisupercomputer.}
\event{1986}{{AMT}, {ICL}, {DAP}}{GVW}
{Active Memory Technology spun off from ICL to develop DAP products.}
\event{1986}{{BBN}}{GVW}
{BBN forms Advanced Computers Inc.\ subsidiary (BBN ACI)
to develop and market Butterfly machines.}
\event{1986}{{FPS}, {T-series}, {transputer}}{BRC}
{Floating Point Systems introduces T-series hypercuben
(Weitek floating-point units coupled to transputers),
and ships 128-processor system to Los Alamos.}
\event{1986}{{TMC}, {CM-1}}{GVW}
{Thinking Machines Corp. ships first Connection Machine CM-1
(up to 65536 single-bit processors connected in hypercube).}
\event{1986}{{Encore}, {Multimax}}{GVW}
{Encore ships first bus-based Multimax computer
(NS32032 processors coupled with Weitek floating-point accelerators).}
\event{1986}{{hypercube}, {MPI}}{GVW}
{CrOS~III, Cubix (file-system handler) and Plotix (graphics handler)
running on Caltech hypercubes.}
\event{1986}{{Culler}}{MW}
{Culler Scientific Systems Corp. (founded by Glenn Culler)
describes the Culler~7 minisupercomputer.}
\typeout{Relevance/importance of Culler~7?}
\event{1986}{{DEC}, {symmetric multiprocessor}, {VAX}}{JF}
{Symmetric multiprocessing supported by VMS.}
\event{1986}{{KSR}}{GVW}
{Henry Burkhardt, former Data General and Encore founder,
forms Kendall Square Research Corporation (KSR)
to build custom multiprocessor.}
\event{1986}{{Dally}, {k-ary n-cubes}, {hypercubes}, {wormhole routing}}{WD}
{Dally shows that low-dimensional k-ary n-cubes are more
wire-efficient than hypercubes for typical values of network
bisection, message length, and module pinout.
Dally demonstrates the torus routing chip, the first
low-dimensional wormhole routing component.}
\event{1989}{{Li}, {shared virtual memory}}{GVW}
{Kai Li describes system for emulated shared virtual memory.}
\yr{1987}
\event{1987}{{ETA}}{GVW}
{ETA produces first air- and liquid nitrogen-cooled versions
of ${\rm{ETA}}^{10}$ multiprocessor supercomputer.}
\event{1987}{{Myrias}, {SPS-1}}{GVW}
{Myrias produces prototype (68000-based) SPS-1.}
\event{1987}{{Caltech}, {hypercube}}{GVW}
{Caltech Mark~III hypercube completed (68020 with wormhole routing).}
\event{1987}{{TMC}, {CM-2}}{GVW}
{TMC introduces CM-2 Connection Machine
(64k single-bit processors connected in hypercube,
plus 2048 Weitek floating point units).}
\event{1987}{{Multiflow}, {Trace/200}}{GVW}
{Multiflow delivers first Trace/200 VLIW machines
(256 to 1024 bits per instruction).}
\event{1987}{{Express}, {ParaSoft}, {hypercube}, {MPI}}{GVW}
{ParaSoft spun off from hypercube group at Caltech
to produce commercial version of CrOS-like MPI.}
\event{1987}{{CRI}, {SSI}}{MW}
{Steve Chen leaves Cray Research to found
Supercomputer Systems, Inc.
SSI is later funded by IBM to build
large-scale parallel supercomputer.}
\event{1987}{{Seitz}, {Ametek}, {Ametek-2010}, {wormhole routing}}{WD}
{Seitz, working at Ametek, builds the Ametek-2010,
the first parallel computer
  using a 2-D mesh interconnect with wormhole routing.}
\typeout{Univac delivered an ISP (Integrated Scientific Processor,
a vector processor) around this time.}
\typeout{Astronautics delivers the ZS-1 computer system around this time,
based on research at Univ. Wisconsin on
the Decoupled Access/Execute architecture.}
\yr{1988}
\event{1988}{{CRI}, {Cray-Y/MP}}{GVW}
{CRI produces first Y/MP multiprocessor vector supercomputer.}
\event{1988}{{AMT}, {DAP}}{GVW}
{AMT delivers first re-engineered DAP
(1024 single-bit processors connected in torus).}
\event{1988}{{Intel}, {iPSC/2}}{GVW}
{Intel produces iPSC/2 hypercube
(80386/7 chip-set with wormhole routing).}
\event{1988}{{DEC}, {MasPar}}{MW}
{MasPar Computer Corp. founded by former DEC executive Jeff Kalb
to develop bit-serial massively-parallel machines.}
\event{1988}{{Inmos}, {transputer}}{GVW}
{Inmos produces first T800 floating-point transputer;
Parsys and Telmat set up to exploit results of
ESPRIT Supernode project,
and begin marketing T800-based machines.}
\event{1988}{{Sequent}, {Symmetry}}{GVW}
{Sequent produces 80386-based Symmetry bus-based multiprocessor.}
\event{1988}{{ParaSoft}, {Express}, {MPI}, {Caltech}, {DIME}}{GVW}
{ParaSoft releases first commercial version of Express MPI;
first version of DIME (Distributed Irregular Mesh Environment)
up and running at Caltech.}
\event{1988}{{FPS}, {Celerity}}{BRC}
                {Floating Point Systems Inc.\ changes name to FPS Computing,
buys Celerity Computing assets,
and produces Model 500 (Celerity 6000)
mini-supercomputer with multiple scalar and vector processors.}
\event{1988}{{Stellar}, {Ardent}}{GVW}
{Ardent and Stellar begin delivering single-user graphics
engineering workstations.}
\event{1988}{{Hitachi}, {S-820}}{YO}
{Hitachi ships first S-820 vector supercomputer.}
\event{1988}{{scalability}, {Gustafson}}{GVW}
{John Gustafson and others demonstrate that
Amdahl's Law can be broken
by scaling up problem size.}
\event{1988}{{Tera}}{LSK}
{Tera Computer Co.\ founded by Burton Smith and James Rottsolk to
develop and market a new multi-threaded parallel computer,
similar to the Denelcor HEP.}
\typeout{Relevance of graphics workstation manufacturers?}
\yr{1989}
\event{1989}{{Multiflow}, {Trace/300}}{MW}
{Multiflow produces second-generation Trace/300 machines.}
\event{1989}{{CDC}, {ETA}}{MW}
{Control Data shuts down ETA Systems in April;
National Science Foundation subsequently shuts down
the John von Neumann Supercomputer Center at Princeton,
which was operating an ETA-10.}
\event{1989}{{CRI}, {CCC}}{GVW}
{Seymour Cray leaves Cray Research to found
Cray Computer Corporation.}
\event{1989}{{BBN}, {TC2000}}{GVW}
{BBN ACI delivers first 88000-based TC2000
distributed shared-memory multiprocessor.}
\event{1989}{{Myrias}, {SPS-2}}{GVW}
{Myrias sell first 68020-based SPS-2
shared-memory multiprocessor.}
\event{1989}{{Meiko}, {Computing Surface}}{GVW}
{Meiko begin using SPARC and Intel i860 processors
to supplement T800s in their Computing Surface machines.}
\event{1989}{{NCube}, {NCube/2}}{MW}
{NCube produces second-generation NCube/2 hypercubes,
again using custom processors.}
\event{1989}{{Concurrent}, {Supercomputer Solutions}}{MW}
{Concurrent Computer Corp. and General Microelectronics
form a 50-50 venture names Supercomputer Solutions, Inc.,
to develop the Navier-Stokes computer developed at Princeton Univ.}
\event{1989}{{Fujitsu}, {VP2000}}{GVW}
{Fujitsu begins production of single-processor VP2000
vector supercomputers.}
\event{1989}{{SCS}}{MW}
{Scientific Computer Systems stops selling its SCS-40
Cray-compatible computer system.
SCS continues to sell high-speed token ring network.}
\event{1989}{{ES-1}}{MW}
{Evans and Sutherland announce the ES-1 parallel computer.
Two systems are delivered, to the University of Colorado and
CalTech. The division is later shut down
during the Supercomputing '89 conference, even while
they had a booth in the convention center in Reno.}
\event{1989}{{Stellar}, {Ardent}, {Stardent}}{MW}
{Stellar and Ardent announce they will merge, forming
Stardent Computers.}
\typeout{Relevance of graphics workstation vendors?}
\event{1989}{{Supertek}}{MW}
{Supertek Computers, Inc., delivers its S-1 Cray-compatible
minisupercomputer; eventually 10 of these are sold.}
\event{1989}{{Valiant}, {random routing}, {PRAM}, {emulation}}{GVW}
{Valiant argues that random routing and latency hiding
can allow physically-realizable machines
to emulate PRAMs in optimal time.}
\yr{1990}
\event{1990}{{Multiflow}}{MW}
{Multiflow closes doors in April after several deals
with other companies fall through.}
\event{1990}{{Alliant}, {FX/2800}}{GVW}
{Alliant delivers first FX/2800 i860-based multiprocessors.}
\event{1990}{{Fujitsu}, {NEC}, {Hitachi}, {HPPS}}{OY}
{In a project led by MITI,
Fujitsu, Hitachi, and NEC build
a testbed parallel vector supercomputer containing
four Fujitsu VP2600s,
NEC's shared memory,
and Hitachi software.}
\event{1990}{{Intel}, {iPSC/860}}{GVW}
{Intel produces i860-based hypercubes.}
\event{1990}{{CCC}, {Cray-3}, {NERSC}}{MW}
{National Energy Research Supercomputer Center (NERSC)
places \$42 million order with Cray Computer Corporation
for Cray-3 supercomputer.
The order includes a unique 8-processor
Cray-2 computer system that is installed in April.}
\event{1990}{{ETA}}{MW}
{The two ETA-10 systems at the closed John von Neumann
Supercomputer Center are destroyed with sledge hammers,
in order to render them useless, after no buyers are found.}
\event{1990}{{MasPar}, {MP-1}}{GVW}
{First MasPar MP-1 delivered
(up to 16k 4-bit processors connected in 8-way mesh).}
\event{1990}{{Cray}, {Supertek}}{MW}
{Cray Research, Inc., purchased Supertek Computers Inc.,
makers of the S-1, a minisupercomputer compatible
with the Cray X-MP.}
\event{1990}{{NEC}, {SX-3}}{PDT}
{NEC ships SX-3,
the first Japanese parallel vector supercomputer
(up to 4 processors, each with up to 4 pipeline sets,
a 2.9 ns clock,
and up to 4 Gbyte of memory).}
\event{1990}{Fujitsu, VP-2600}{YO}
{Fujitus ships first VP-2600 vector supercomputer.}
\event{1990}{{QCDPAX}, {Anritsu}}{YO}
{University of Tsukuba completes 432 processor machine QCDPAX
in collaboration with Anritsu Corporation.}
\event{1990}{{APR}, {FORGE}, {MIMDizer}, {PSR}}{BMB}
{Applied Parallel Resarch (APR) spun off from
Pacific-Sierra Research (PSR) to develop FORGE and MIMDizer
parallelization tools,
and upgrade them to handle Fortran~90.}
\event{1990}{{data-parallel languages}, {TMC}, {AMT}}{1990}
{TMC and AMT sign co-operative agreement to standardize languages.}
\event{1990}{{Dally}, {J-Machine}, {Actors model}}
{MIT J-Machine demonstrates message-driven network
interface that reduces overhead of message handling.}


1986 Sequent produces first shared-memory Balance multiprocessors,using NS32032 microprocessors and proprietary DYNIX symmetric operating system.<br>
Active Memory Technology spun off from ICL to develop DAP products.


\yr{1991}
1988,Sequent produces 80386-based Symmetry bus-based multiprocessor.<br>
\event{1991}{{CRI}, {Cray-Y/MP C90}}{GVW}
ParaSoft releases first commercial version of Express MPI; first version of DIME (Distributed Irregular Mesh Environment<br>
{CRI produces first Y/MP~C90.}
Tera Computer Co.\ founded by Burton Smith and James Rottsolk to develop and market a new multi-threaded parallel computer<br>
\event{1991}{{HAL}}{MW}
{HAL Computer Systems started by former IBM RS/6000 workers
to build high speed Sparc microprocessor.}
\typeout{Should HAL entry be ``multiprocessor'', not ``microprcoessor''?}
\event{1991}{{Myrias}}{GVW}
{Myrias closes doors.}
\event{1991}{{BBN}}{MW}
{BBN shuts down its Advanced Computers, Inc. (ACI) subsidiary,
though it continues to sell TC-2000 computers.}
\event{1991}{{KSR}}{MW}
{Kendall Square Research starts to deliver 32-processor
KSR-1 computer systems.}
\event{1991}{{Stardent}}{MW}
{Stardent, formed by merger in 1989, announces it will sell off
its business and shut its doors.
The graphics computer line (the former Ardent architecture)
is eventually taken over by Kubota Pacific Computer Corp.}
\event{1991}{{Bailey}, {benchmarking}}{GVW}
{David Bailey publishes complaint about abuse of benchmarks,
particularly by parallel computer vendors.}
\event{1991}{{Ranade}, {butterfly}, {PRAM}, {emulation}}{GVW}
{Abhiram Ranade describes how message combining,
butterfly networks,
and a complicated routing algorithm
can emulate PRAMs in near-optimal time.}
\event{1991}{{TMC}, {CM-200}}{BMB}
{Thinking Machines Corporation produces CM-200 Connection Machine,
an upgraded CM-2.
MIMD CM-5 announced.}


1990 the first Japanese parallel vector supercomputer up to 4 processors, each with up to 4 pipeline sets,a 2.9 ns clock,and up to 4 Gbyte of memory <br>
Applied Parallel Resarch (APR) spun off from Pacific-Sierra Research (PSR) to develop FORGE and MIMDizer parallelization tools, and upgrade them to handle Fortran~90


\yr{1992}
1991, CRI produces first Y/MP~C90.
\event{1992}{{Gibson}, {RAID}}{GVW}
1991, Kendall Square Research starts to deliver 32-processor
{Garth Gibson's thesis on redundant arrays of inexpensive disks
KSR-1 computer systems.
(RAID) published.}
1991, Abhiram Ranade describes how message combining, butterfly networks, and a complicated routing algorithm can emulate PRAMs in near-optimal time.
\typeout{Need earlier references to RAID concepts.}
1991, Thinking Machines Corporation produces CM-200 Connection Machine, an upgraded CM-2.MIMD CM-5 announced.
\event{1992}{{AMT}}{MW}
{AMT bankrupt; revived as AMT Cambridge Ltd.}
\event{1992}{{FPS}, {CRI}}{BRC}
                {FPS Computing bankrupt; selected assets bought by CRI,
and Cray Research Superservers (CRS) subsidiary formed.}
\event{1992}{{CCC}, {Cray-3}}{MW}
{National Energy Research Supercomputer Center (NERSC)
at Lawrence Livermore National Laboratory
cancels contract to buy Cray-3 from Cray Computer Corp.}
\event{1992}{{KSR}}{MW}
{Kendall Square Research announces KSR-1 after testing
a system with 128 processors and a second level ring interconnect.}
\event{1992}{{MasPar}}{MW}
{MasPar Computer starts delivering its second generation machine,
the MP-2.}
\event{1992}{{PCN}, {Chandy}, {Taylor}}{GVW}
{Chandy and Taylor describe PCN,
a parallel programming system similar to Strand~88,
based on dataflow and logic programming.}
\event{1992}{{TMC}, {CM-5}}{BMB}
{Thinking Machines Corporation produces first CM-5,
containing up to 1024 Sparc microprocessors
connected in a fat tree topology,
each with up to 4 vector units
manufactured by Texas Instruments.
A RAID system for the CM-5 is also announced.}


1992,Chandy and Taylor describe PCN, a parallel programming system similar to Strand~88,based on dataflow and logic programming.
1992, Thinking Machines Corporation produces first CM-5, containing up to 1024 Sparc microprocessors connected in a fat tree topology,
each with up to 4 vector units manufactured by Texas Instruments.
A RAID system for the CM-5 is also announced.


\yr{1993}
1993,512-node J-Machines, message-driven multicomputers,operational
\event{1993}{{J-Machine}, {MIT}, {Caltech}, {Actors model}}{WD}
at MIT and Caltech.
{512-node J-Machines (message-driven multicomputers) operational
1993,NEC produces Cenju-3,containing up to 256 VR4400SC (MIPS R4000 runalike) processors connected by an Omega network.
at MIT and Caltech.}
\event{1993}{{SSI}}{MW}
{IBM stops funding of Supercomputer Systems Inc.;
company shuts down.}
\event{1993}{{CCC}, {Cray-3}}{MW}
{Cray Computer Corp. announces availability of Cray-3.}
\event{1993}{{Meiko}, {LLNL}}{GVW}
{Lawrence Livermore National Laboratory (LLNL)
announces intention to purchase CS-2 Computing Surface from Meiko,
the first major purchase by a U.S.\ national laboratory
from a vendor with roots outside the U.S.}
\event{1993}{{Fujitsu}, {NWT}, {VPP-500}}{YO}
{Fujitsu installs 140-processor NWT (Numerical Wind Tunnel),
a prototype of a highly-parallel vector supercomputer
to be called VPP-500.}
\event{1993}{{NEC}, {CENJU-3}}{PDT}
{NEC produces Cenju-3,
containing up to 256 VR4400SC (MIPS R4000 runalike) processors
connected by an Omega network.}


==Web Resources:==
==Web Resources:==

Latest revision as of 11:43, 17 March 2005

Description:

Exploration of parallelism is far from just a question of scientific significance. Even the building of ancient pyramids involved concurrent cooperations, work load balancing, pipelining and resource scheduling, which all fell into the pocket of giant parallelism. This is yet another evidence that computer has fallen behind human intelligence when carrying out computation tasks one by one devotionally. However proud we may be about this, if any, computers are catching up, especially when pushed by a large number of scientific scholars who are more and more discontented for long program execution time and distribued data intensive computations. Parallel computing is the savior for those ancious scientists: it provides:

  • enormous computing power,
  • very large scale distributed data warehouses support,
  • high performance and
  • efficiency.

It does these by multithread of control and by sharing of both heterogeneous and homogeneous resources. There are mainly three aspects of parallel computing:

  1. algorithms and application;
  2. programming methods, languages and environments;
  3. parallel machines and architectures.

Enablers:

- High-end graphics is very expensive, use standard (inexpensive) components, similar idea as cluster computing, but applied to graphics
- Many applications need much faster machines
- Sequential machines are reaching their speed limits
- Use multiple processors to solve large problems fast
- Microprocessors are getting cheaper and cheaper
- Analyzing video images
- Aircraft modeling
-Ozone layer modeling
-Climate modeling
-Ocean circulation
-Quantum chemistry
-General: computational science
-Computer chess
-Protein folding
-Sequence alignment
-Grid computing

Inhibitors:

  • Interprocess communication bottleneck, especially with low bandwidth and high latency
  • Lack of application level tools, including programming models and execution environments
  • The hierarchical nature of program structure limits its application over common flat platforms
  • Productivity gap between currently available hardware platforms and programing paradigms
  • The need for high performance and data-intensive computations may have little to do with home PCs but rather large institutions which need data and computation resouce sharing and coorperation.
  • Lack of efficient scheduling algorithms and automatic parallelism.

Paradigms:

The future of parallel computing is very prosperous, expecially after the emergence of Grid technology, which provides a middleware layer on top of which parallel programs can run and communicate with each other directly and transparently. It won't be long that someday everyone could rely solely on his and his friends' I386 machine (if still any) to do all kinds of complex computation within seconds, without knowing a whole world working, cooperating and communicating behind scence.

Here are some paradigms concerning future parallel computing:

  • Parallel machine organization:
    • Processor array
    • Shared memory multiprocessors
    • Distributed memory multiprocessors
  • Flynn's taxonomy:
    • SISD: Single Instruction Single Data Traditional uniprocessors
    • SIMD: Single Instruction Multiple Data Processor arrays
    • MISD: Multiple Instruction Single Data Nonexistent?
    • MIMD: Multiple Instruction Multiple Data Multiprocessors and multicomputers
  • General phases in designing and building parallel programs:
    • Partitioning
    • Communication
    • Agglomeration
    • Mapping
  • New distributed applications that use data or instruments across multiple administrative domains and that need much CPU power:
    • Computer-enhanced instruments
    • Collaborative engineering
    • Browsing of remote datasets
    • Use of remote software
    • Data-intensive computing
    • Very large-scale simulation
    • Large-scale parameter studies
  • Grid programming models:
    • RPC(Remote procedure call)
    • Task parallelism
    • Message passing
    • Java programming
  • Grid application execution environments:
    • Parameter sweeps
    • Workflow
    • Portals

Experts:

Professor Henri Bal, Vrije Unversiteit Amsterdam, http://www.cs.vu.nl/~bal
Dr. Thilo Kielmann, Vrije Universiteit Amsterdam, http://www.cs.vu.nl/~kielmann/
Professor Ian Foster, University of Chicago, http://www-fp.mcs.anl.gov/~foster/
Dutch grid, http://www.dutchgrid.nl
Gridlab, http://www.gridlab.org
Globus, http://www.globus.org
Global Grid Forum, http://www.ggf.org
UCSB, CMU, UC Berkeley, Monash Unversity, Cambridge Unversity, Stanford University

Timing:

1956,IBM starts Stretch project with the goal of producing a machine with 100 times the performance of the IBM 704,initiated by Atomic Energy Commission at Los Alamos.

1959,IBM delivers first Stretch computer; less than 10 are ever built.

1964,Control Data Corporation produces CDC~6600,the world's first commercial supercomputer,Atomic Energy Commission urges manufacturers to look at ``radical machine structures.This leads to CDC Star-100, TI ASC, and Illiac-IV.

1966,Bernstein introduces Bernstein's Condition for statement independence,which is foundation of subsequent work on data dependency analysis.Flynn publishes paper describing architectural taxonomy.

1967,Amdahl publishes paper questioning feasibility of parallel processing;his argument is later called ``Amdahl's Law.

1968,Cyberplus Group formed at Control Data to study computing needs for image processing; this leads to AFP and Cyberplus designs.

1969,CDC produces CDC~7600 pipelined supercomputer.

1970, Floating Point Systems Inc. founded by former C~N~Winningstad and Tektronix employees to manufacturer floating-point co-processors for minicomputers.
Asymmetric multiprocessor jointly developed by MIT and DEC

1971, CDC delivers hardwired Cyberplus parallel radar image processing system to Rome Air Development Center,where it produces 250 times the performance of CDC~6600

1971, Edsger Dijkstra poses the dining philisophers problem which is often used to test the expressivity of new parallel languages.

1972,Paper studies of massive bit-level parallelism done by Stewart Reddaway at ICL.These later lead to development of ICL DAP.
Asymmetric multiprocessor operating system TOPS-10 developed by DEC for PDP-10 minicomputers.

1976,Control Data delivers Flexible Processor,a programmable signal processing unit. Floating Point Systems Inc.\ delivers 38-bit AP-120B array processor that issues multiple pipelined instructions every cycle

1977,C.mmp hardware completed at Carnegie-Mellon University crossbar connecting minicomputers to memories
Massively Parallel Processor project first discussed at NASA for fast image processing.

1979, ICL DAP delivered to Queen Mary College, London ---world's first commercial massively parallel computer
Parviz Kermani and Leonard Kleinrock describe the virtual cut-through technique for message routing.

1980,First generation DAP computers delivered by ICL

1981,Floating Point Systems Inc. delivers 64-bit FPS-164 array processor that issues multiple pipelined instructions every cycle,start of mini-supercomputer market
First BBN Butterfly delivered ---68000s connected through multistage network to disjoint memories,giving appearance of shared memory.

1983,DARPA starts Strategic Computing Initiative, which helps fund such machines as Thinking Machines Connection Machine
Massively Parallel Processor delivered by Goodyear Aerospace to NASA Goddard

1985, David Jefferson describes how virtual time and time warping can be used as a basis for speculative distributed simulations

1986 Sequent produces first shared-memory Balance multiprocessors,using NS32032 microprocessors and proprietary DYNIX symmetric operating system.
Active Memory Technology spun off from ICL to develop DAP products.

1988,Sequent produces 80386-based Symmetry bus-based multiprocessor.
ParaSoft releases first commercial version of Express MPI; first version of DIME (Distributed Irregular Mesh Environment
Tera Computer Co.\ founded by Burton Smith and James Rottsolk to develop and market a new multi-threaded parallel computer

1990 the first Japanese parallel vector supercomputer up to 4 processors, each with up to 4 pipeline sets,a 2.9 ns clock,and up to 4 Gbyte of memory
Applied Parallel Resarch (APR) spun off from Pacific-Sierra Research (PSR) to develop FORGE and MIMDizer parallelization tools, and upgrade them to handle Fortran~90

1991, CRI produces first Y/MP~C90. 1991, Kendall Square Research starts to deliver 32-processor KSR-1 computer systems. 1991, Abhiram Ranade describes how message combining, butterfly networks, and a complicated routing algorithm can emulate PRAMs in near-optimal time. 1991, Thinking Machines Corporation produces CM-200 Connection Machine, an upgraded CM-2.MIMD CM-5 announced.

1992,Chandy and Taylor describe PCN, a parallel programming system similar to Strand~88,based on dataflow and logic programming. 1992, Thinking Machines Corporation produces first CM-5, containing up to 1024 Sparc microprocessors connected in a fat tree topology, each with up to 4 vector units manufactured by Texas Instruments. A RAID system for the CM-5 is also announced.

1993,512-node J-Machines, message-driven multicomputers,operational at MIT and Caltech. 1993,NEC produces Cenju-3,containing up to 256 VR4400SC (MIPS R4000 runalike) processors connected by an Omega network.

Web Resources:

>>back>>