In: Computer Science
How long will each iteration of the loop take in steady state (i.e., ignoring startup latency) on our 5 stage pipeline? Assume the use of data forwarding and hardware interlocking (bubbles and stalls) and that the branch is not predicted (i.e., stalls/bubbles are required for a branch).
irmovl $5, %edx
irmovl $80, %ebx
Loop:
mrmovl array_base(%ebx), %eax
addl %edx, %eax
rmmovl %eax, array_base(%ebx)
addl $-4, %ebx
jne Loop
If the branch delay slots were exposed to the compiler/user, could you move any instructions into those delay slots? Remember that overall program dependences must be obeyed.
- True
- False
The main goal of a pipelined architecture is to first and foremost complete the instruction for every clock cycle . To do this we should maimtain the same rate, the pipeline should contain all the instructions at all times.
The branch delay slot creates side effect of pipelined architectures, this side effect is due to the branch hazard, i.e. if the instruction that is imposed on the pipeline is completed then only the branch gets resolved.
The ideal number of branch delay slots in a particular pipeline implementation is dictated by the number of pipeline stages, the presence of register forwarding, at what stage of the pipeline the branch conditions are computed, whether or not a branch target buffer is used and many other factors affect the delay slots.
Software compatibility requirements dictate that an architecture may not change the number of delay slots from one generation to the next