Parallel computing
Classical computing paradigm: serial computing
Calculations are performed in sequence: one task at a time.
The workload is comprised of serial tasks, computed by a single processor.

What if we want to make things faster?
We can drop off some tasks or we can reduce the task size or employ a faster processor.
But ultimately the time consumed for computing depends on the sum of all times consumed by individual tasks.
But take a closer look at the workload:

We force calculations to be serialized
Dependencies among them are not purely sequential.
an example:
int doSmth(int a, int b) {
int c = a + b;
int d = a * b;
int e = c + d;
int f = c * d;
return e/f;
}
What can we do => parallelise!
Independent tasks can be performed concurrently.
int doSmth( int a, int b) {
int c = a + b; && int d = a * b;
int e = c + d; && int f = c * d;
return e/f;
}
A paradigm shift: parallel computing
Calculations are carried out simultaneously
Large problems are divided into small pieces, each of which is solved concurrently

Universe in parallel not serial.
Many complex, interrelated events happening at the same time, yet within a sequence::
planetery movements
vehicle assembly line
rush hour traffic
queue to go to a concert
tectonic plate drift
ordering a burger at the drive thru
Historically parallel computing is used in High Perfromance Computing, to model many scientific and engineering problems:
atmosphere, weather, earth, environment
physical processes (high pressure, particle movement, fusion)
geology, seismology
chemistry and molecular dynamics
biosciences and genetics
circultry design, microelectronics
mechanical engineering
Nowadays parallel computing is used in processing large amounts of data
data management, data mining
oil exploration
medical imaging and diagnosis
web search engines
financial and economic modeling
pharmaceutical design
advanced graphics and virtual reality
collaborative environments
Why use parallel computing? In theory more resources - less time to complete
save time and money (in time-constrained settings)
solve larger problems
solve more problems
achieve better solutions
more resources in parallel: provide concurrenry - may use non-local resources
many commodity processors in parallel instead of one expensive fast processor
Parallel computing:
parallelism of memory: different pieces of data being dealt with (served as input to calculations)
parallelism of execution path: different pieces of code being executed concurrently
Being parallel does not mean there is no communication. The computation can be concurrent, but the partial results have to be amalgamated in the end anyway!
Flynn's taxonomy
One of the most widely used means for classifying computer architectures, proposed in 1966
Single instruction
Multiple instruction
Single data
SISD
MISD
Multiple data
SIMD
MIMD
The taxonomy in hardware: bit-level parallelism
Roughly: the more bits a processor works with, the less instructions issued to perform an operation
Example: adding two 64-bit integers in a 32-bit word processor vs. 64-bit word processor
The taxonomy in hardware: instruction-level parallelism
Instructions can be reordered and grouped
They are allowed to execute concurrently or partially overlapped; yet, this does not change the output.
Compiler identifies which instructions depend on each other so to schedule their execution in parallel.
Instruction pipelining: multiple instructions are partially overlapped

Superscalar execution:
more than one instruction is executed during a single clock cycle
use redundant functional units inside the processor (e.g.: ALU, bit shifter, multiplier)
Out-of-order execution:
sequence of executed instructions is driven also by the availability of input data
if an instruction stalls waiting for input (e.g. read from memory) another instruction is issued
results are queued until the stalled instruction ends
Register naming:
some instructions may depend on one another due to same registers being used
in case the value stored is not used, the tasks can be made independent by renaming the registers
Speculative execuion: if resources are idle, run instructions that may not be needed
prefetching in memory: load piece of data that is going to be used in a near future
branch prediction: try to guess which branch will be executed before knowing for sure
optimistic concurrency control: execute many instructions in parallel and only commit those which did not have its data modified by other instructions
eager execution: execute both sides sides of a coditional branch and only commit those results depending on how the condition is elavated
The taxonomy in software: data-level parallelism
Divide the input into independent pieces that can be processed simultaneously, example: two gardeners, each weeding and watering 1/2 of the garden.
The taxonomy in software: task-level parallelism
Divide tasks into independent sub-tasks that can be performed simultaneously, example: two gardeners, one weeding and one waterin all the garden
Last updated
Was this helpful?