1.21 Floating Point Numbers / Advanced PIC Microcontroller Projects in C / Библиотека (книги, учебники и журналы) / В помощь Веб-Мастеру

Обложка
Аннотация

Ibrahim Dogan i

Книги автора: Advanced PIC Microcontroller Projects in C

Книга: Advanced PIC Microcontroller Projects in C

1.21 Floating Point Numbers

Floating point numbers are used to represent noninteger fractional numbers, for example, 3.256, 2.1, 0.0036, and so forth. Floating point numbers are used in most engineering and technical calculations. The most common floating point standard is the IEEE standard, according to which floating point numbers are represented with 32 bits (single precision) or 64 bits (double precision).

In this section we are looking at the format of 32-bit floating point numbers only and seeing how mathematical operations can be performed with such numbers.

According to the IEEE standard, 32-bit floating point numbers are represented as:

31 30 23 22 0 X XXXXXXXX XXXXXXXXXXXXXXXXXXXXXXX ? ? ? sign exponent mantissa

The most significant bit indicates the sign of the number, where 0 indicates the number is positive, and 1 indicates it is negative.

The 8-bit exponent shows the power of the number. To make the calculations easy, the sign of the exponent is not shown; instead, the excess-128 numbering system is used. Thus, to find the real exponent we have to subtract 127 from the given exponent. For example, if the mantissa is “10000000,” the real value of the mantissa is 128 – 127 = 1.

The mantissa is 23 bits wide and represents the increasing negative powers of 2. For example, if we assume that the mantissa is “1110000000000000000000,” the value of this mantissa is calculated as 2^–1 + 2^-2 + 2^-3 = 7/8.

The decimal equivalent of a floating point number can be calculated using the formula:

Number = (–1)s 2^e-127 1.f

where

s = 0 for positive numbers, 1 for negative numbers

e = exponent (between 0 and 255)

f = mantissa

As shown in this formula, there is a hidden 1 in front of the mantissa (i.e, the mantissa is shown as 1.f ).

The largest number in 32-bit floating point format is:

0 11111110 11111111111111111111111

This number is (2–2^–23)2¹²⁷ or decimal 3.403?10³⁸. The numbers keep their precision up to 6 digits after the decimal point.

The smallest number in 32-bit floating point format is:

0 00000001 00000000000000000000000

This number is 2^–126 or decimal 1.175?10^–38.

Оглавление книги

Оглавление статьи/книги

Похожие страницы