Question

In: Computer Science

Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5,...

Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same.

Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.

Solutions

Expert Solution

We know, clock cycles = Number of instructions * CPI

There are different types of instructions so the total clock cycles would be the sum of all the various clock cycle

So, number clock cycles for one processor:

Clock cycles = ((2.56 * 109 * 1) + (1.28 * 109 * 12) + (256 * 106 * 5)) = 1.92 * 1010

Execution time = Clock cycles / Clock speed.

Clock Speed = 2 GHz = 2 * 109 Hz [As we know, 1 GHz = 109 Hz]

So, execution time for one processor = ((1.92 * 1010) / (2 * 109)) = 9.6 seconds

The clock cycles for p processors:

According to the question, the arithmetic and load store instructions get divided by 0.7p.

So, Clock cycles = (2.56 * 109 * 1) / 0.7p + (1.28 * 109 * 12) / 0.7p + 256 * 106 * 5

or, Clock cycles = (2.56 * 1010) / p + 1.28 * 109

The execution time for p processors:

Execution time = Clock cycles / Clock speed.

= ((2.56 * 1010) / p + 1.28 * 109) / (2 * 109)

= 12.8 / p + 0.64

The execution time for p = 2.

Execution time = 12.8 / 2 + 0.64 = 7.04

Speedup (compared to single processor) = 9.6 / 7.04 = 1.36

Let's find execution time for p = 4.

Execution time = 12.8 / 4 + 0.64 = 3.84

Speedup (compared to single processor) = 9.6 / 3.84 = 2.5

Let's find execution time for p = 8.

Execution time = 12.8 / 8 + 0.64 = 2.24

Speedup (compared to single processor) = 9.6 / 2.24 = 4.29

Please comment in case of any doubt.
Please upvote if this helps.


Related Solutions

Assume that for a given program 70% of the executed instruction are arithmetic, 10% are load/store,...
Assume that for a given program 70% of the executed instruction are arithmetic, 10% are load/store, and 20% are branch. i. Given the instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store instruction takes 6 cycles, and a branch instruction takes 3 cycles, ?nd the average CPI. ii. For a 25% improvement in performance, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all.
. Write a sequence of instructions to calculate the following arithmetic expression and store the result...
. Write a sequence of instructions to calculate the following arithmetic expression and store the result in register CX: 20 – 6 + (-10) - (-8) + 15 Trace the contents of registers, assume initial contents are 0000 ps(there are multiple boxes) Instruction AX BX CX DX Remark initial 0000 0000 0000 0000
Consider two different implementations of the same ISA. There are four classes of instructions, Arithmetic, Store,...
Consider two different implementations of the same ISA. There are four classes of instructions, Arithmetic, Store, Load, and Branch. The clock rate and CPI of each implementation are given in the following table.              Clock Rate                  CPI-Arithmetic CPI-Store             CPI-Load    CPI-Branch P1           2.0 GHz                        1                                     2 3                                4 P2           2.5 GHz                        2    2 2                             2 Given a program with 10^6 instructions divided into classes as follows: 10% Arithmetic, 20% Store, 50% Load, and 20% Branch, which implementation is faster?
Assume that at time 5 no system resources are being used except for the processor and...
Assume that at time 5 no system resources are being used except for the processor and memory. Now consider the following events: At time 5: P1 executes a command to read from disk unit 1. At time 15: P2's time slice expires. At time 18: P4 executes a command to write to disk unit 1 At time 20: P3 executes a command to read from disk unit 2. At time 24: P2 executes a command to write to disk unit...
Suppose the target assembly language for a compiler has these five instructions for integers: load address,...
Suppose the target assembly language for a compiler has these five instructions for integers: load address, reg add reg, reg, reg sub reg, reg, reg mul reg, reg, reg store reg, address In these instructions, an address is the name of a static variable (whose actual address will be filled in by the loader). A reg is the name of an integer register,
a special extra-fast memory location inside the processor. The target assembly language has three integer registers: r1,...
5. Find the sum of terms in given arithmetic sequence 1 + 3 + 5 +...
5. Find the sum of terms in given arithmetic sequence 1 + 3 + 5 + ... + 59 6. Find the sum of terms in given arithmetic sequence 2 + 5 + 8 + ... + 41 7.Given a geometric sequence 6 + 2 + 2/3 + ... is this sequence converging or diverging, if it is converging find it's sum
MINDING THE STORE On January 1, Ruth Cummings was formally named branch manager for the Saks...
MINDING THE STORE On January 1, Ruth Cummings was formally named branch manager for the Saks Fifth Avenue store in a suburb of Denver. Her boss, Ken Hoffman, gave her this assignment on her first day: “Ruth, I’m putting you in charge of this store. Your job will be to run it so that it becomes one of the best stores in the system. I have a lot of confidence in you, so don’t let me down.” One of the...
11.1 Simple Arithmetic Program Using the instructions from Week 1 Lab, create a new folder named...
11.1 Simple Arithmetic Program Using the instructions from Week 1 Lab, create a new folder named Project01. In this folder create a new class named Project01. This class must be in the default package. Make sure that in the comments at the top of the Java program you put your name and today's date using the format for Java comments given in the Week 1 Lab. For this lab, you will write a Java program to prompt the user to...
A cantilever bean is 12 feet long and has a uniformly distributed dead load of 600...
A cantilever bean is 12 feet long and has a uniformly distributed dead load of 600 lbs/ft and a uniformly distributed live load of 1000 lbs/ft. Design the beam for flexure assuming it is continuously braces by a floor/deck system. Use 36 ksi steel. a) draw Free body Diagram/ shear + moment diagram to find Mu Max b) Solve for Zx based on Mu Max c) Select beam from table 9.1 d) Check compactness criteria e) if compactness works, recalculate...
For the following exercises, write a recursive formula for each arithmetic sequence. a = {−1, 2, 5, ... }
For the following exercises, write a recursive formula for each arithmetic sequence.a = {−1, 2, 5, ... }
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT