In: Math
1.A. Dr. Smith purchased a network attached storage server with eleven 2TB hard drives in it. When he purchased the system, unknowingly to him, the supplier used hard drives from a bad manufacturing batch and thus among his drives there are four underperforming drives. (Underperforming drives have a much shorter lifetime.) They figured this out and since there is no way of determining which hard drives are faulty ahead of time (i.e., ahead of their death), they issued Dr. Smith a full refund and told him to keep the drives. He wanted to trash the drives but one of his students requested a few of them for his project. The student figured to use the drives in a RAID-1 configuration. (From Wikipedia : “A RAID 1 creates an exact copy (or mirror) of a set of data on two or more disks. This is useful when read performance or reliability is more important than data storage capacity.”) What are the chances that he is not going to lose any data prematurely if Dr. Smith gave him two, three, and four of the drives (what’s the probability for each of those setups)?
1.B. Same situation as in 1.A. However, the
student is more adventurous, he wants to use RAID-5. (RAID 5 is a
redundant array of independent disks configuration that uses disk
striping with parity. Because data and parity are striped evenly
across all of the disks, no single disk is a bottleneck. Striping
also allows users to reconstruct data in case of a disk failure.
RAID 5 evenly balances reads and writes, and is currently one of
the most commonly used RAID methods. It has more usable storage
than RAID 1 and RAID 10 configurations, and provides performance
equivalent to RAID 0. RAID 5 groups have a minimum of three hard
disk drives (HDDs) and no maximum. Because the parity data is
spread across all drives, RAID 5 is considered one of the most
secure RAID configurations.) How big of a virtual drive will he
obtain and what are the chances that he is not going to lose any
data prematurely if Dr. Smith gives him three or four of
drives?
1.C. We are doing real time processing of data
from 40 data sources. At the beginning of each time slot, each data
source may (or may not) generate a data-set to be processed; the
probability that any individual source actually generates a
data-set is: 0.002 (and data sources are independent). In each time
slot we can process up to two data-sets. As the processing is real
time the processed results are only interesting if they are done
within the time slot. What is the probability that we can process
all incoming data-sets in any particular time slot?
1. RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two or more drives, thereby producing a "mirrored set" of drives. Thus, any read request can be serviced by any drive in the set.
RAID 5 uses parity instead of mirroring for data redundancy. The system calculates parity and writes that parity into the drive. By keeping data on each drive, any two drives can combine to equal the data stored on the third drive, keeping data secure in case of a single drive failure.