Question

In: Computer Science

These questions concern the following 16-bit floating point representation: The first bit is the sign of...

These questions concern the following 16-bit floating point representation: The first bit is the sign of the number (0 = +, 1 = -), the next nine bits are the mantissa, the next bit is the sign of the exponent, and the last five bits are the magnitude of the exponent. All numbers are normalized, i.e. the first bit of the mantissa is one, except for zero which is all zeros.

a) What is the largest number? (in both 16-bit binary floating point and explicit decimal representations)? The smallest number?

b) What non-zero number is closest to zero? (in both binary and decimal)

c) What's the smallest difference between two consecutive or adjacent numbers?

d) How many significant binary digits do numbers in this representation have? How many significant decimal digits does that correspond to?

e) What's the smallest difference between two consecutive or adjacent numbers? The largest?

f) What's the largest difference between two consecutive or adjacent numbers? (Give both numbers in both binary and decimal representations)

g) Translate .1 decimal into our 16-bit binary floating point representation and translate the result back into decimal. What happened? Why?

h) Give a simple rule for determining when a fraction (i.e. a ratio of two integers) can be represented exactly by a terminating decimal expression (i.e. non-repeating).

i) Give a simple rule for determining when a fraction can be represented exactly by a terminating binary expression (i.e. non-repeating).

Solutions

Expert Solution

a).

The largest number representable is: 65504

Largest number (Binary) : 0 11110 1111111111

The smallest number representable is: 0.000061035156

Smallest number (Binary) : 0 00001 0000000000

b).

Smallest non zero number : 0.000000059604645

Smallest non zero number (Binary) :

0 00000 00000000012 = 000116

c).

Smallest difference between two adjacent numbers :

[1 x 2^-31 + (0 x 2^-9 x 2^-31)]. This is the smallest positive number which can be represented.

The next number in sequence will be (increment that 0) (2^-31 + 1 x 2^-9 x 2^-31)

So the smallest difference calculated is difference of these two terms.

d).

This representation has 9 bits for mantissa so significant Binary digits are 9 which corresponds to 3.31 or approximately 3 significant decimal digits.

e).

Smallest difference between two adjacent numbers :

[1 x 2^-31 + (0 x 2^-9 x 2^-31)]. This is the smallest positive number which can be represented.

The next number in sequence will be (increment that 0) (2^-31 + 1 x 2^-9 x 2^-31)

So the smallest difference calculated is difference of these two terms.

Largest difference between two adjacent numbers :

Maximum difference between two successive real number will occur at extremes. This is because numbers are represented upto mantissa bits and as the exponent grows larger, the difference gets multiplied by a larger value. (The minimum difference happens for the least positive exponent value):

f).

This has been answered in second section of e).

G).

the decimal number 0.1 is not representable in binary floating-point of any finite precision; the exact binary representation would have a "1100" sequence continuing endlessly:

e = −4; s = 1100110011001100110011001100110011...,

where, as previously, s is the significand and e is the exponent.

When rounded to 24 bits this becomes

e = −4; s = 110011001100110011001101,

which is actually 0.100000001490116119384765625 in decimal.).

h).

So periodic functions can be used to determine both Binary and decimal recurrings.


Related Solutions

Concern the following 16-bit floating point representation: The first bit is the sign of the number...
Concern the following 16-bit floating point representation: The first bit is the sign of the number (0 = +, 1 = -), the next nine bits are the mantissa, the next bit is the sign of the exponent, and the last five bits are the magnitude of the exponent. All numbers are normalized, i.e. the first bit of the mantissa is one, except for zero which is all zeros. 1. What's the smallest difference between two consecutive or adjacent numbers?...
Consider the following 32-bit floating point representation based on the IEEE floating point standard: There is...
Consider the following 32-bit floating point representation based on the IEEE floating point standard: There is a sign bit in the most significant bit. The next eight bits are the exponent, and the exponent bias is 28-1-1 = 127. The last 23 bits are the fraction bits. The representation encodes number of the form V = (-1)S x M x 2E, where S is the sign, M is the significand, and E is the biased exponent. The rules for the...
Find the 3-bit mantissa floating point representation of the following numbers, both by chopping and rounding,...
Find the 3-bit mantissa floating point representation of the following numbers, both by chopping and rounding, and then calculate the associated respective absolute error and relative error: (a) 11/16 (b) 2.75
Convert 1.67e14 to the 32-bit IEEE 754 Floating Point Standard, with the following layout: first bit...
Convert 1.67e14 to the 32-bit IEEE 754 Floating Point Standard, with the following layout: first bit is sign bit, next 8 bits is exponent field, and remaining 23 bits is mantissa field; result is to be in hexadecimal and not to be rounded up. answer choices 5717E27B 57172EB7 5717E2B7 C717E2B7 5771E2B7
Convert the following number into 32bit IEEE 754 floating point representation. 0.000101
Convert the following number into 32bit IEEE 754 floating point representation. 0.000101
Assume that you have a 12-bit floating point number system, similar to the IEEE floating point...
Assume that you have a 12-bit floating point number system, similar to the IEEE floating point standard, with the format shown below and a bias of 7. The value of a floating point number in this system is represented as    FP = (-1)^S X 1.F X 2^(E-bias) for the floating point numbers A = 8.75 and B = -5.375. The binary representation of A is given as A = 0101 0000 1100 Show the hexidecimal representation of B.
Determine the IEEE single and double floating point representation of the following numbers: a) -26.25 b)...
Determine the IEEE single and double floating point representation of the following numbers: a) -26.25 b) 15/2
verilog code to implement 32 bit Floating Point Adder in Verilog using IEEE 754 floating point...
verilog code to implement 32 bit Floating Point Adder in Verilog using IEEE 754 floating point representation.
Determine the IEEE single and double floating point representation of the following numbers: a) (15/2) x...
Determine the IEEE single and double floating point representation of the following numbers: a) (15/2) x 2^50 b) - (15/2) x 2^-50 c) 1/5
Using IEEE 754 single precision floating point, write the hexadecimal representation for each of the following:...
Using IEEE 754 single precision floating point, write the hexadecimal representation for each of the following: a. Zero b. -2.0 (base 10) c. 256. 0078125 (base 10) d. Negative infinity
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT