In: Computer Science
7. Amdahl’s law provides a very useful metric to measure speedup when we make applications parallel using more resources. However, Amdahl’s law’s basic form makes an assumption about the application that may not hold in real-world scenarios. State the assumption made by Amdahl’s law and describe conditions that may break the assumption.
Let's suppose you have two people search through a telephone book for a particular telephone number, one starting at the front and the second in the middle. Simple reasoning will guess that this is a perfectly parallelizable problem, so should give a speedup of two. The average speedup will be two, given enough trials, but on any one search, the speedup may range from one to almost any number. If the number in question is in the first half of the phone book, then having the second person look in the second half of the book gives no benefit. If the number is the first number in the second half, then the second person will find it immediately, giving an almost infinite speedup for that one search. Amdahl’s Law fails to predict this because it assumes that adding processors won’t reduce the total amount of work that needs to be done, which is reasonable in most cases, but not for search.
The second case is where cache memories come into play. The Law computes speedup as affected by adding more compute resources, that is, more processors. However, processors typically come with other resources that can affect performance, specifically cache memory. In particular, in a strong scaling regime, adding more processors means each processor does less total work. Less total work often means less data to process, and less data means a smaller working set size at each step. A smaller working set will be more likely to fit in the processor cache memory. Fitting a working set into the cache memory is a step function: it fits, or it doesn’t. If it doesn’t, then the data will be fetched from main memory at memory speeds. When you add enough processors that the working set fits into cache, then data will be fetched from cache at cache speeds, which can be ten times faster than main memory.
The Law assumes that the performance improvement from adding processors will be a linear function of the number of processors. However, building larger caches may improve performance for some problems more than by adding more processors, allowing you to get a speedup greater than allowed by The Law. A better performance limit to address this is to change the resource unit you measure. Instead of computing performance or speedup per processor, measure the performance or speedup per transistor or gate.
Conclusion:
The Law has achieved truly prominent importance, capturing in one inequality a strict limit on the achievable speedup from parallelism, when its assumptions are valid. Because of this, it is often quoted and misquoted, used and abused to justify or attack various design decisions. Its original motivation was to highlight that speedup is not necessarily directly related to processor count, so working on faster uniprocessors is still important.
But the assumptions in the Law are no longer valid. Some of the assumptions are that the problem size is fixed, there are no discontinuities in the processor performance curve, the only resource that is varied is the number of processors, parallel overhead is minimal, and the processors are homogeneous. It simply does not capture the performance effects of systems that break these assumptions.