In: Physics
Assume you have a superscalar CPU with in-order issue and in-order instructions that uses 8 registers (R0-R7). The usual rules include: up to two instructions can be issued in one cycle; instructions have to complete in the order they are issued; an instruction attempting to write to a register that is being read by any incomplete instruction cannot be issued until the incomplete instruction completes; any instruction attempting to read a register that is being written to by any incomplete instruction cannot be issued until the incomplete instruction retires; no new instructions can be issued in a cycle when instructions are retiring; multiplication/division instructions takes 3 cycles to complete while addition/subtraction instructions take only 2 cycles. Suppose you need to run the following four instructions:
1. R3 = R1 * R2
2. R1 = R4 + R5
3. R5 = R2 + R8
4. R6 = R1 * R4
Cycle |
Instruction # |
Decoded |
Issued |
Retired |
1 |
1 |
R3 = R1 * R2 |
1 |
How many cycles will the CPU take if out-of-order instruction issue and execution is allowed. Use a scoreboard as shown above to illustrate your answer.
Cycle |
Instruction # |
Decoded |
Issued |
Retired |
1 |
1 |
R3 = R1 * R2 |
1 |
Let us make the take for the above order of instructions
Cycle | Instruction# | Decoded | Issued | Retired |
1 | 1 | R3=R1*R2 | 1 | 3 cycles |
4 | 2 | R1=R4+R5 | 1 | 2 cycles |
6 | 3 | R5=R2+R8 | 1 | 2 cycles |
8 | 4 | R6=R1*R4 | 1 | 3 cycles |
Therefore, cycle 8 + 3 cycles to Retired = 11 cycles is the usual sequence of this order.
If our of order sequence is given,there are two possibilities:
a) the instructions cannot repeat
b) the instructions can repeat
In the case a) above,we will be observing that there are there are going to be same number of cycles of operations but a different cycle of instruction. This meant that we will need same instructions and hence the total sequence will cycle to 11 as well.
In case b) above, we will be observing a whole new order, where we cannot tell exactly which instruction will be replaced by the repetition of which instruction and how many times. However, there is surely information as to what the possibility might be for the number of cycles. If we inspect just one among the addition and one among the multiplication,we could guess the same for the other addition and the multiplication operation. This simplifies our task for discovering the sets of tables for the cycles. However, it is also necessary where we replace the instruction that matters as well.
Cycle | Instruction # | Decoded | Issued | Retired |
1 | 1 | R3=R1*R2 | 1 | 3 cycles |
4 | 1 | R3=R1*R2 | 2 | 3 cycles |
7 | 3 | R5=R2+R8 | 1 | 2 cycles |
9 | 4 | R6=R1*R4 | 1 | 3 cycles |
Hence for 3 multiplications and one addition, it takes 9+3=12 cycles
Cycle | Instruction # | Decoded | Issued | Retired |
1 | 1 | R3=R1*R2 | 1 | 3 cycles |
4 | 1 | R3=R1*R2 | 2 | 3 cycles |
7 | 1 | R3=R1*R2 | 3 | 3 cycles |
10 | 4 | R6=R1*R4 | 1 | 3 cycles |
Hence for 4 multiplications, it takes 10+3=13 cycles
Thus we see that another possibility of 3 additions+1 multiplication and 4 additions are left over
Cycle | Instruction# | Decoded | Issued | Retired |
1 | 2 | R1=R4+R5 | 1 | 2 cycles |
3 | 2 | R1=R4+R5 | 2 | 2 cycles |
5 | 3 | R5=R2+R8 | 1 | 2 cycles |
7 | 4 | R6=R1*R4 | 1 | 3 cycles |
Thus for 3 additions and 1 multiplication operation, we have 7+3=10 cycles of operation
Cycle | Instruction # | Decoded | Issued | Retired |
1 | 2 | R1=R4+R5 | 1 | 2 cycles |
3 | 2 | R1=R4+R5 | 2 | 2 cycles |
5 | 3 | R5=R2+R8 | 1 | 2 cycles |
7 | 2 | R1=R4+R5 | 3 | 2 cycles |
Thus for 4 additions,we have 7+2=9 cycles of operation.
So we can clearly tell that an out-of-order instruction can yield an operation of 9,10,12 or 13 cycles whatever possibility of instruction sequence is passed to decode in the ALU of the superscalar CPU.