Mitigating Precision Loss in Big Data Analytics

In building our big data analytics software, we need to represent and perform calculations on floating point numbers. In order to keep the results of these calculations as accurate as possible we seek to maintain the greatest precision possible in intermediate floating point operations, only rounding on the last operation (which is usually the operation returning the presented result to the UI).

Passing and manipulating floating point numbers without losing precision is a common problem area in computer programming.

Once a floating point number, or the result of a floating point calculation, is saved into memory there is a chance of loss of precision due to the rounding needed in order to store the number into a finite number of bits.

Also, unless rounding is delayed until the last possible operation of a set of floating point operations, precision can be lost along the way with any intermediate rounding.

Inherent Inaccuracy in Storing Floating Point Numbers

The inherent inaccuracy in storing floating point numbers can be seen clearly with simple Java code:

double a = 0.41;  
double b = 0.31;  
double c = a - b;  
System.out.println(c);  

This prints 0.09999999999999998 instead of 0.1 as one might expect

Inaccuracy if Rounding Not Delayed

The scale of the inaccuracy that can occur if rounding is not delayed became apparent to me in a small piece of functionality that we were developing recently.

This functionality calculates a percentage delta between a previous value and a current value. Both previous and current values are themselves the result of a preceding floating point operation.

A single delta is calculated with the following formula:

100 * (currentValue - previousValue)/ previousValue  

For examples sake let’s express currentValue and previousValue in terms of the sub values, A and B, so that they are calculated from:

currentValue = 100 * currentValueA / (currentValueA + currentValueB)

previousValue = 100 * previousValueA / (previousValueA + previousValueB)  

Now, some values:

currentValueA = 30       currentValueB = 270  
previousValueA = 20     previousValueB = 280  

Calculating Without Delayed Rounding

previousValue = 100 * 20/300 = 6.66666666667   rounded = 7  
currentValue = 100 * 30/300 = 10

delta = 100 * (10 - 7)/7 = 42.8571428571  

rounded = 43

Calculating With Delayed Rounding

previousValue = 100 * 20/300 = 6.66666666667  
currentValue = 100 * 30/300 = 10

delta = 100 * (10 - 6.66666666667)/6.66666666667 = 49.9999999999  

rounded = 50

Accuracy Improvement

So, by delaying rounding until the last calculation, we have an end result which is out by a relatively negligible amount of 0.0000000001 instead of 6.9999999999.

Alternatives Approaches to Use With or Instead of Delayed Rounding

Decide a large precision that can be stored safely and always round to this precision.

e.g

double a = 0.41;  
double b = 0.31;  
double c = a - b;  
System.out.println(c);  

This prints 0.09999999999999998 instead of 0.1 as one might expect

But choosing a precision of 9.…

double a = 0.41;  
double b = 0.31;  
double c = Math.round((a - b) * 10E9)/10E9;  
System.out.println(c);  

This prints 0.1 as one would expect

Alternative Floating Point Representations

There are programming languages that allow for delaying rounding simply by supporting a Rational data type (the type name varies across languages) and allowing arithmetic operations on these Rational types.

e.g Clojure, Julia, Haskell, Ruby to name a few.

Some even support Rational literals, e.g the Ratio type in Clojure

(/ 2 3) 

will return a Ratio 2/3

There are also libraries to offer this support in other languages.
e.g The Apache Commons Math library for java with its Fraction class

Conclusion

The representation of floating point numbers in memory is inherently inaccurate.

This can be improved by:

  • Delaying rounding of floating point numbers until the final possible moment.
  • Picking a large floating precision and always rounding to it.
  • Using a different representation for Rational numbers.

Tom Prior

Read more posts by this author.

Subscribe to Poppulo Technology Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!