In: Computer Science
Let us assume that you are designing a multi-core processor to be fabricated on a fixed silicon die area budget of A. As an architect, you are to partition this total area of A into one large core and many small cores. The large core will have an area of S, while the small cores will each have an area of 1 (note that there will be A - S number of small cores). Assume that the single-threaded performance of a core scales with the square root of its area. On this multi-core processor, we will execute a workload where P fraction of its work is infinitely parallelizable and 1-P of its work is serial.
We have two configurations of the multi-core processor:
Configuration X: A = 24; S = 8 (i.e., one large core and 16 small cores).
Configuration Y: A = 24; S = 20 (i.e., one large core and 4 small cores)
Assume that the serial portion of the workload executes only on the large core and the parallel portion of the workload executes only on the small cores.
Answer the following three questions:
A. What is the speedup of the workload on configuration X? (compared to the execution time on a single-core processor of area 1).
B. What is the speedup of the workload on configuration Y? (compared to the execution time on a single-core processor of area 1).
C. For workloads that have limited parallelism (that is, P part of the workload is very small), which configuration between X and Y would you recommend and why?
Equal speedup is characterized as the proportion of the time needed to figure some capacity utilizing a
single processor (T1) isolated when needed to register it utilizing P processors (TP). That
is: speedup = T1/TP. For instance on the off chance that it takes 10 seconds to run a program consecutively and
2 seconds to run it in equal on some number of processors, P, at that point the speedup is 10/2=5
times.
Equal proficiency gauges how much utilization of the equal processors we are making. For P
processors, it is characterized as: efficiency= 1/P x speedup= 1/P x T1/TP. For
model, proceeding with a similar model, if P is 10 processors and the speedup is multiple times,
at that point the equal productivity is 5/10=.5. All things considered, just 50% of the processors
were utilized to pick up the speedup and the other half were inert.
Amdahl's law expresses that the most extreme speedup conceivable in parallelizing a calculation is
restricted by the consecutive part of the code. Given a calculation which is P% equal,
Amdahl's law expresses that: MaximumSpeedup=1/(1-(P/100)). For instance if 80% of
a program is equal, at that point the most extreme speedup is 1/(1-0.8)=1/.2=5 occasions. In the event that the program in
question took 10 seconds to run sequentially, as well as could be expected trust in an equal execution
would be for it to take 2 seconds (10/5=2). This is on the grounds that the sequential 20% of the program
can't be accelerated and it takes .2 x 10 seconds = 2 seconds regardless of whether the remainder of the code is run
impeccably in equal on a limitless number of processors so it takes 0 seconds to execute.
The Gustafson-Barsis law expresses that speedup will in general increment with issue size (since the
division of time spent executing sequential code goes down). Gustafason-Barsis' law is in this manner a
proportion of what is known as "scaled speedup" (scaled by the quantity of processors utilized on a
issue) and it tends to be expressed as: MaximumScaledSpeedup=p+(1-p)s, where p is the
number of processors and s is the portion of all out execution time spent in sequential code. This
law discloses to us that feasible speedup is frequently identified with issue size not simply the quantity of
processors utilized. Basically Amdahl's law expects that the level of sequential code is
free of issue size. This isn't really obvious