RPG

Numeric Datatypes in RPG : FLOAT


  1. FLOAT is also a type of binary data type.
  2. Due to IEEE standards, most of the programming languages handle FLOAT in the same way.
  3. INTEGER data type is only for whole numbers (Numbers without decimal point).
  4. Theoretically, FLOAT is for real number which covers every possible number (With or without decimal point).
  5. But it is totally different how FLOATs are implemented using BINARY format.
    1. FLOAT can hold from very small to very large values.
    2. FLOAT calculations are correct up to some approximation. So, if you need high accuracy in calculation results, DO NOT USE FLOAT.
      • More on this later.

 FLOAT

There are numerous ways of presenting a number. For example-

Number 6257 can be represented in multiple ways like-

  1. 625.7  * 10
  2. 62.57  * 102
  3. 6.257 * 103
  4. .6257 * 104

Same is true for binary numbers as well. For example, following could be presentations of number 10102-  

  1. 101.0 * 21
  2. 10.10 * 2
  3. 1.010 * 23
  4. and so on.

Normalization

Since there are multiple ways to represent the same number, IEEE standardized representation which is called “Normalization”.

  1. Simple rule is to represent a number in the format which allows only 1 (greater than 0) digit on left of decimal point (e.g. X.XXXXXX). So-
    1. Normalized form of 6257 would be 6.257 * 103  i.e. + 6.257 * 103
    2. Similarly, normalized form of 10102 would be 1.010 * 2 i.e. + 1.010 * 23
  2. FLOATs are saved in BINARY format and due to normalization, left digit will always be 1 for every number in BINARY format. (See examples below-)
    1. 1410 ==> 11102 ==> + 1.110 * 23
    2. 610 ==> 01102 ==> + 1.10 * 23
  3. In binary format, since left side of decimal is always 1, Floating numbers kind of omit it.

FLOAT FORMAT

  1. FLOAT are saved in memory in a 3 part format
    1. s : Sign bit to store sign information as
      1. 0 (Zero) = Positive
      2. 1 (One) = Negative.
      3. Because Sign bit is not included in value calculation-
        1. FLOAT can have both POSITIVE ZERO and NEGATIVE ZERO.
        2. and thoritically POSITIVE INFINITY and NEGATIVE INFINITY.
    2. e : Exponent
      1. Number which is used as power of 2. (2exponent)
      2. In 1.110 * 23  ==> Exponent is 3 (It is saved in a little different way using some offset. We will discuss it later in more detail.)
      3. Bits, used by exponent, depend on type of FLOAT.
      4. Exponent itself is a signed number.
    3. m : Mantissa (also known as significant)
      1. This is the number to right of decimal point.
      2. In 1.110 * 23  ==> Mantissa is 110
      3. Bits, used by Mantissa also, depend on type of FLOAT.
      4. Always saved in normalized form.

Types of FLOAT

There are following two types-

  1. Single-precision (32 bit)
  2. Double-precision (64 bit)

Single-precision FLOAT

In RPG, single precision floats are represented as 4 length with data type “F” as shown below-

D FloatNum        S              4F		(Fixed format syntax)
dcl-s FloatNum float(4); 			(Free format syntax)

It uses 32 bits as-
  1. SIGN : 1 bit
    • 0 : Positive
    • 1  : Negative
  2. Exponent : 8 bits (with offset of 127)
  3. Mantissa : 23 bits
 SIGN Exponent Mantissa
 1  8  7  6  5  4  3  2  1  23  22  21  20  19  18  17  16 15  14  13  12  11  10  9  8  7  6  5  4 3  2  1

Exponent and Offset

Exponent is a signed number. As signed number are saved in two’s complement form, 8 bit signed number can hold -128 to 127 but this two’s complement form had some issues with FLOAT’s Exponent part.

So Offset was introduced to save NEGATIVE numbers as POSITIVE and keep the different between POSITIVE number and “NEGATIVE numbers saved as POSITIVE”

In Single-precision floats this offset is +127.

NOTE : Exponents of −127 (all 0s) and +128 (all 1s) are reserved for special numbers.

So if Exponent is

  1. -3  will be saved as (-3 + 127) = 124.
  2. +3 will be saved as (3 + 127) = 130.

Examples:

Number 12.00
In Binary format 1100
In Normalized format 1.1 * 23 Sign bit(Positive) Exponent=23+127  = 2130 Mantissa
0
1 0 0 0 0 0 1 0
10000000000000000000000

 

Number -0.625
In Binary format 0.101  How decimal points are converted Binary
In Normalized format 1.01 * 2-1 Sign bit(Negative) Exponent=2-1+127  = 2126 Mantissa
1
0 1 1 1 1 1 1 0
01000000000000000000000

Following is the link to test how decimal number is converted into FLOAT format.

https://www.h-schmidt.net/FloatConverter/IEEE754.html

 

Double-precision FLOAT

In RPG, double precision floats are represented as 8 length with data type “F”

D FloatNum        S              8F      (Fixed format syntax)
dcl-s FloatNum float(8);                 (Free format syntax) 

They use 32 bits as-

  1. SIGN : 1 bit
    • 0 : Positive
    • 1  : Negative
  2. Exponent : 11 bits (with offset of 1023)
  3. Mantissa : 52 bits

Double precision behave same as Single precision just the size is different.


Float and Accuracy.

Floats are not recommended for high precision calculations where are high level accuracy is required (e.g. financial calculations).

Here  is an example-

If I paid 10 cents to one person and 20 cents to other person. What is total amount (in dollars)  I paid ? Yes, simply, answer is 0.30 dollars.

0.10 + 0.20 = 0.30

But when we do the same calculation with FLOAT numbers, result looks little different (see below)-

0.10 + 0.20 = 0.30000000000000004

So,  in a program, condition in conditional statement like “If (0.10 + 0.20 = 0.30)” will never be passed in calculations using Floating numbers.

There is very simple reason behind this. Small calculations, involving numbers with few decimal points (e.g. 0.110 and 0.210), cannot be represented with high accuracy in binary format.

Also, when system tries to convert 0.110 into binary, it becomes  so called “Infinite Binary Fractions” which means that a portion of binary number starts repeating itself without any end. (see example below)

0.110 =  .00011001100110011 . . .2

So system has to make some compromises and saves approximate values of such numbers and when these approximate values are used in calculations, they give approximation (not 100% accurate) results.

Same issue with Base 10 number system:

This approximation issue is not just with FLOATs. The basic DECIMAL (base 10) number system also has the same issue. For example, if we divide 10 by 3, we get a approximate value. (It creates an infinite series of repeating 3s as shown below-)

10/3 = 3.3333…

but 3.3333 * 3 = 9.9999 (not 10)


IBM i developer.

View Comments