chapter 3 parallel computing - plymouth state...
TRANSCRIPT
Chapter 3
Parallel Computing
As we have discussed in the Processor module,
in these few decades, there has been a great
progress in terms of the computer speed, in-
deed a 20 million fold increase during a fifty
year period.
This is done, mainly due to the fact that more
and more transistors have been integrated into
a silicon chip, from a few to tens (SSI), to hun-
dreds (MSI), to thousands (LSI), and to the bil-
lions (VLSI).
1
Moore’s law
This phenomenon is nicely summarized via the
Moore’s law: The number of transistors placed
on a chip has been doubled every eighteen
month.
For example, Intel 8086, a processor chip made
by Intel in 1978, contained 29,000 transistors,
and ran at 5 MHz; and the Intel Core 2 Duo,
introduced in 2006, contained 291 million tran-
sistors and ran at the speed of 2.93 GHz.
Thus, during those 28 years, the number of
transistors has gone up by 10,034 times, or
doubled once every 24 months, or two years.
2
A picture worthshow many words?
More importantly, this increase of the transis-
tors directly leads to an increase of the com-
puter speed. In this case, the speed goes up
by 586 times during this period.
The following chart shows the increase of the
computer speed corresponding to that of the
integration number.
3
Not just the speed...
Moreover, besides processing speed, some of
the other capabilities of many digital electronic
devices are also strongly connected to Moore’s
law: memory capacity, sensors and even the
number and size of pixels in digital cameras.
As a result, all of these technology have also
been speeding up at this stunning exponential
rate as well.
Since Moore’s law precisely describes a driving
force of technological and social change in the
past thirty or so years, it has been used to
guide long term planning and to set targets
for research and development.
4
A dead end?
Unfortunately, this era of steady and rapid growth
of single-processor performance over 30 years
is essential over, because
• By “doubling every eighteen months,”, we
have to make the wires√
2 thinner every
eighteen months. This has to come to an
end at some point since we can’t make the
wires infinitely thin.
• Although every transistor produces only a
tiny bit of heat, when you put billions of
them to a tiny space, the amount do add
up, ..., to that at the surface of the Sun.
• We also have essentially done our best to
dig out all the benefits of a complicated
single processor architecture.
5
What to do?
Fortunately, Moore’s law is not completely out
of the window yet. It is predicted that it will
continue for another five years or so.
This many transistors will no longer be used
to construct a single processor, but to increase
the number of independent processors in a sin-
gle chip. We will then try to speed up the
whole process of letting those independent pro-
cessors work on the data in parallel.
An analogy could be that, in the ancient time,
we can only cook one thing at a time with our
old fashioned stove.
6
Nowadays, with a contemporary stove, we can
cook many different dishes in parallel, or at the
same time, which certainly saves time.
Similarly, we could cut up a big problem into
many smaller ones, and run them in parallel
with multiple processors. Could we?
7
They are happeningeverywhere...
Indeed, we can find many examples of “parallel
computing” in our work and/or life: multiple
galaxies running in the Universe, multiple lanes
in I-93, multiple gas pumps in most of the gas
stations, etc..
8
It is difficult....
They all sound good, but it is not as easy.
In the cooking example, a good chef knows
that she will not always cook everything at the
same time. To cook the dish of, e.g., Pepper,
Onions and Pork, she has to fry the pepper,
and the pork first, which can be done at the
same time; then fry the onion, which is mixed
with the partially fried pepper and the pork.
In the multiple lane case, although the cars in
different lanes can go forward in parallel, the
cars in the same lane have to go forward in
turn.
It is the same idea to do computing in paral-
lel. You have to figure out what parts can be
done in parallel, and what have to be done in
parallel.
9
An example
We have been using computers to do the courses
registration for quite a few years now. When
adding somebody into a class, a program has
to make sure, among other things, that the to-
tal number of students added into a class is no
more than the cap of that class, 25 for ours.
If we run course add sequentially, i.e., one by
one, this is what the program will do to add
another student into this calfs:
if the current number < 25
then add this student
Thus, before we add in another student, we
always check the cap.
10
The parallel case
Since the above add consists of two steps: one
check and another add, when we try to add
multiple requests at the same time, we might
get into trouble since we don’t know in what
order will the steps get mixed up.
For example, if there are 24 students signed up
for this course, and two more students come
to add into the course.
What is to happen?
11
This will.
If we do the add in parallel, and it happens
that the arrangement of the two steps for the
two adds look like the following:
Request 1 time Request 2
- ↓ -Check the number t1 -(Still 24) ↓ -- t2 Check the number- ↓ (Still 24)Add in student t3 -(Now 25) ↓ -- t4 Add in student- ↓ (Now 26)
Thus, as the above charts shows, we will add
more students than what the cap requires.
12
Software is really hard
Although we have been working with parallel
computer hardware for a long time, since the
late 1960’s, its programming is really difficult
as we have to take care of the communication
and coordination issues between the multiple
processors, just like when we do conference
calls, we want to make sure that only one per-
son speaks at a time.
In other words, the difficulty lies in on the soft-
ware part, although we can come with lots of
cheap hardware parts.
13
How fast could it be?
The natural expectation for the speed-up from
parallelization would be linear: If you put in a
two lane highway, then two cars can do through
the toll both at the same time, and if you put
in a four lane, then four cars can pay tolls in
parallel.
That is why we often put in multiple toll booths,
e.g., in Exit 11 in I-93. On the other hand,
this does not happen to the parallel comput-
ing: very few parallel algorithms achieve linear
speed-up.
Most of them have a near-linear speed-up for
small numbers of processing elements, but de-
grades to constant value for large numbers of
processing elements.
14
Here is the limit
The potential speed-up of a parallel algorithm
on a parallel computer is given by Amdahl’s
law, established in 1960s by Gene Amdahl.
When a big problem is cut into a bunch of
smaller one, some of them can run in parallel,
while the others have to run as a sequence,
then, it is the latter that will decide overall
speed-up available from parallelization.
This relationship is given by the equation:
S =1
1 − P,
where S is the speed-up of the program, as a
factor of its original sequential runtime, and P
is the fraction that can be run in parallel.
15
An example
If we cut the problem into ten pieces, nine of
them can run in parallel, while one piece can’t,
we have
S = 10%, P = 90%,
then, the Amdahl’s law tells us that
S =1
1 − 0.9=
1
0.1= 10.
In other words, at most, we can speed it up
10 times, no matter how many processors we
throw in.
This result thus puts an upper limit on the use-
fulness of adding more parallel execution units.
One way to put it: “The bearing of a child
takes nine months, no matter how many women
are assigned.”
16
Discussion topics
• Do some further research on Amdahl’s law,
and share with us your findings in laymen’s
language.
• What are some of the successful applica-
tions of this multi-processing idea in paral-
lel computing? Give some details.... What
is it? Why do we do it in parallel? What
are the benefits, as compared with sequen-
tial computing?
• In your life, study and/or work, have you
ever applied the multi-processing strategy,
i.e., do multiple things at one time? If
yes, give us some examples: what is the
problem? how to you cut it into smaller
problems? Can all these smaller ones be
run in parallel? If not all of them can be run
in parallel, how do you coordinate them?
17