In: Computer Science
Find the 3-bit mantissa floating point representation of the following numbers, both by chopping and rounding, and then calculate the associated respective absolute error and relative error:
(a) 11/16
(b) 2.75
Prerequisite Information
Let us first understand the various elements of the question which we need to calculate using an example (we'll use the fraction 2/3 and to understand the concept) -
1. Converting a fraction or decimal to floating point - To do this, let us take fl(x) as the n–digit floating-point number nearest to x where n (= number of decimal digits)
If number of digits n = 4, fl (1/3) =
.3334 x 100 in 4 decimal digit floating point
representation.
If number of digits n = 6, fl (1/5) = .000002 x 105 in 6 decimal digit floating
point representation.
If number of digits n = 5, fl (4/3) = .13334 x 101 in 5 decimal digit floating
point representation.
2. Let us take examples and convert decimal/fraction to floating point number with rounding and chopping -
Let fraction number x = 2/3, then 3 decimal floating point representation of fl(x) will be as follows -
Rounding: fl(2/3) =
(.667) x 100 rounded
Chopping: fl(2/3) = (.666) x 100
chopped
Let fraction number x = 2/3, then 3 decimal floating point representation of fl(x) will be as follows -
Rounding: fl(5.7) =
(.006) x 103
rounded
Chopping: fl(5.7) = (.005) x 103 chopped
3. Let us find the relative error and absolute error for the fl(2/3) -
Let p be the original number and p* be the floating value after rounding/chopping -
Rounding: fl(2/3) = (.667) x 100 rounded
Absolute error = |p-p*| = | (2/3) -
0.667 | = 3.3333 x 10-4
Relative error = |p-p*| / |p|= | (2/3) - 0.667 | / | (2/3) = 5.0 x
10-4
Similarly, we can find absolute and relative error for chopping.
Solution
(a) x = 11/16, n = 3 fl(11/16) = 0.6875
Rounding
fl(11/16) = (0.688) x 100
rounded
Relative error = | (11/16) - 0.688 | = 5.0 x 10-4
Absolute error = | (11/16) - 0.688 | / | (11/16) = 7.27272 x
10-4
Chopping
fl(11/16) = (0.687) x 100
chopped
Relative error = | (11/16) - 0.687 | = 5.0 x 10-4
Absolute error = | (11/16) - 0.687 | / | (11/16) = 7.27272 x
10-4
(b) x = 2.75, n = 3 fl(2.75) = 2.75
Rounding
fl(11/16) = (0.028) x
102
rounded
Relative error = | 2.75 - 2.8 | = 5.0 x 10-2
Absolute error = | 2.75 - 2.8 | / | (2.75) = 1.8181 x
10-2
Chopping
fl(11/16) = (0.027) x
102
chopped
Relative error = | 2.75 - 2.7 | = 5.0 x 10-2
Absolute error = | 2.75 - 2.7 | / | (2.75) = 1.8181 x
10-2
Please let me know in case you face any issues in the above mentioned calculations.