Numeric Datatypes in RPG : FLOAT
 FLOAT is also a type of binary data type.
 Due to IEEE standards, most of the programming languages handle FLOAT in the same way.
 INTEGER data type is only for whole numbers (Numbers without decimal point).
 Theoretically, FLOAT is for real number which covers every possible number (With or without decimal point).
 But it is totally different how FLOATs are implemented using BINARY format.
 FLOAT can hold from very small to very large values.
 FLOAT calculations are correct up to some approximation. So, if you need high accuracy in calculation results, DO NOT USE FLOAT.
FLOAT
There are numerous ways of presenting a number. For example
Number 6257 can be represented in multiple ways like
 625.7 * 10^{1 }
 62.57 * 10^{2}
 6.257 * 10^{3}
 .6257 * 10^{4}
Same is true for binary numbers as well. For example, following could be presentations of number 1010_{2 }
 101.0 * 2^{1}
 10.10 * 2^{2 }
 1.010 * 2^{3}
 and so on.
Normalization
Since there are multiple ways to represent the same number, IEEE standardized representation which is called “Normalization”.
 Simple rule is to represent a number in the format which allows only 1 (greater than 0) digit on left of decimal point (e.g. X.XXXXXX). So
 Normalized form of 6257 would be 6.257 * 10^{3 }i.e. + 6.257 * 10^{3}
 Similarly, normalized form of 1010_{2} would be 1.010 * 2^{3 } i.e. + 1.010 * 2^{3}
 FLOATs are saved in BINARY format and due to normalization, left digit will always be 1 for every number in BINARY format. (See examples below)
 14_{10} ==> 1110_{2} ==> + 1.110 * 2^{3}
 6_{10} ==> 0110_{2} ==> + 1.10 * 2^{3}
 In binary format, since left side of decimal is always 1, Floating numbers kind of omit it.
FLOAT FORMAT
 FLOAT are saved in memory in a 3 part format
 s : Sign bit to store sign information as
 0 (Zero) = Positive
 1 (One) = Negative.
 Because Sign bit is not included in value calculation
 FLOAT can have both POSITIVE ZERO and NEGATIVE ZERO.
 and thoritically POSITIVE INFINITY and NEGATIVE INFINITY.
 e : Exponent
 Number which is used as power of 2. (2^{exponent)}
 In 1.110 * 2^{3}_{ }==> Exponent is 3 (It is saved in a little different way using some offset. We will discuss it later in more detail.)
 Bits, used by exponent, depend on type of FLOAT.
 Exponent itself is a signed number.
 m : Mantissa (also known as significant)
 This is the number to right of decimal point.
 In 1.110 * 2^{3} ==>_{ }Mantissa is 110
 Bits, used by Mantissa also, depend on type of FLOAT.
 Always saved in normalized form.
 s : Sign bit to store sign information as
Types of FLOAT
There are following two types
 Singleprecision (32 bit)
 Doubleprecision (64 bit)
Singleprecision FLOAT
In RPG, single precision floats are represented as 4 length with data type “F” as shown below
D FloatNum S 4F (Fixed format syntax)
dcls FloatNum float(4); (Free format syntax) It uses 32 bits as
 SIGN : 1 bit
 0 : Positive
 1 : Negative
 Exponent : 8 bits (with offset of 127)
 Mantissa : 23 bits
SIGN  Exponent  Mantissa  
1  8  7  6  5  4  3  2  1  23  22  21  20  19  18  17  16  15  14  13  12  11  10  9  8  7  6  5  4  3  2  1 
Exponent and Offset
Exponent is a signed number. As signed number are saved in two’s complement form, 8 bit signed number can hold 128 to 127 but this two’s complement form had some issues with FLOAT’s Exponent part.
So Offset was introduced to save NEGATIVE numbers as POSITIVE and keep the different between POSITIVE number and “NEGATIVE numbers saved as POSITIVE”
In Singleprecision floats this offset is +127.
NOTE : Exponents of −127 (all 0s) and +128 (all 1s) are reserved for special numbers.
So if Exponent is
 3 will be saved as (3 + 127) = 124.
 +3 will be saved as (3 + 127) = 130.
Examples:
Number  12.00  
In Binary format  1100  
In Normalized format  1.1 * 2^{3}  Sign bit(Positive)  Exponent=2^{3+127 }= 2^{130}  Mantissa  
0 

10000000000000000000000 
Number  0.625  
In Binary format  0.101  How decimal points are converted Binary  
In Normalized format  1.01 * 2^{1}  Sign bit(Negative)  Exponent=2^{1+127 }= 2^{126}  Mantissa  
1 

01000000000000000000000 
Following is the link to test how decimal number is converted into FLOAT format.
https://www.hschmidt.net/FloatConverter/IEEE754.html
Doubleprecision FLOAT
In RPG, double precision floats are represented as 8 length with data type “F”
D FloatNum S 8F (Fixed format syntax)
dcls FloatNum float(8); (Free format syntax)
They use 32 bits as
 SIGN : 1 bit
 0 : Positive
 1 : Negative
 Exponent : 11 bits (with offset of 1023)
 Mantissa : 52 bits
Double precision behave same as Single precision just the size is different.
Float and Accuracy.
Floats are not recommended for high precision calculations where are high level accuracy is required (e.g. financial calculations).
Here is an example
If I paid 10 cents to one person and 20 cents to other person. What is total amount (in dollars) I paid ? Yes, simply, answer is 0.30 dollars.
0.10 + 0.20 = 0.30
But when we do the same calculation with FLOAT numbers, result looks little different (see below)
0.10 + 0.20 = 0.30000000000000004
So, in a program, condition in conditional statement like “If (0.10 + 0.20 = 0.30)” will never be passed in calculations using Floating numbers.
There is very simple reason behind this. Small calculations, involving numbers with few decimal points (e.g. 0.1_{10 }and 0.2_{10}), cannot be represented with high accuracy in binary format.
Also, when system tries to convert 0.1_{10 }into binary, it becomes so called “Infinite Binary Fractions” which means that a portion of binary number starts repeating itself without any end. (see example below)
0.1_{10 }= .00011001100110011 . . ._{2}
So system has to make some compromises and saves approximate values of such numbers and when these approximate values are used in calculations, they give approximation (not 100% accurate) results.
Same issue with Base 10 number system:
This approximation issue is not just with FLOATs. The basic DECIMAL (base 10) number system also has the same issue. For example, if we divide 10 by 3, we get a approximate value. (It creates an infinite series of repeating 3s as shown below)
10/3 = 3.3333…
but 3.3333 * 3 = 9.9999 (not 10)
Sumit, this is interesting reading–thanks for putting it together.
A logical followup discussion might center on which data formats are processed most efficiently by SQL and in arithmetic operations. There’s a potential performance gain in knowing when and where to use signed and unsigned integer, zoned decimal, packed decimal, and float database *and work* variables.
A question for IBM: do changes to the POWER family affect “performance coding” practices?
Thanks Reeve.. that’s a good idea to follow up with “which data formats are processed most efficiently by SQL and in arithmetic operations”.