Floating Point Precision

The problem with numbers is they always look right.

If your DAQ card says that the temperature is 23.1 degrees, who are you to argue! All the way from the sensor to the screen, the quality of the information typically degrades as it is converted and recalculated.

One such source of degradation is rounding errors due to floating point precision.

Whilst floating point numbers look continuous this is not true, they have rounding errors too. I’m not going to dig into how they work too much here, there are plenty of resources on that (Wikipedia has more than I can bear to read right now!) however I want to talk about how the format trades off precision vs range.

LabVIEW helps us by using the double precision format by default which gives a precision to approximately 16 decimal figures vs the standard float in many languages which only gives around 7 decimal figures.

But as with everything, there is a cost. The double values weight in at 64 bits vs. the singles 32 bits which when your storing a lot of data comes at a cost. I had such a case recently where I wanted to store timestamps in as small a space as possible with sub-millisecond precision, so the question arose, can it fit in a single?

Machine Epsilon

The first thing you will find when you go searching for precision on floating point numbers is the mystical Machine Epsilon.

This represents the smallest change that a floating point number can represent, there is a LabVIEW constant for this.

machine epsilon

This describes the best possible precision however it can be worse. Floating point numbers represent numbers as a combination of a significand and exponent (like scientific notation at school i.e. 5.2 x 10^5) which allows it to trade off range vs precision (hence the floating point), this means as the size of the number increases, the precision reduces.

For my example, this was particularly important as timestamps in a floating point format are extremely large values (seconds since 1904) which means they lose precision. Which makes this piece of code, break the laws of maths:

timestamp with machine epsilon

So I went in hunt of a definition of how precise these numbers are, which was surprisingly difficult! I think there are two reason why this doesn’t appear to be defined in many places:

  1. Maybe it’s just obvious to everyone else?
  2. A factor must be that the following formula makes assumptions about optimum representation, some numbers can be represented multiple ways which means that there is no single answer.

Eventually I came across a stack overflow question which covered this.

In essence the rules are:

  1. For a given exponent, the error is all the same (i.e. if we are multiplying by 2^2, the smallest change for all numbers would be 4).
  2. The exponent is set by the size of the number (i.e. if the number is 6, the exponent should be 3 as that gives us the best precision).
  3. Knowing the size of the number we can work out the exponent, given the size of the floating point number and a given exponents we can work out the smallest change.

The maths in the post is based on a function available in MATLAB that gives us the epsilon (eps) value for a given number. Translated into LabVIEW, looks like this:

calculate epsilon

With this I could see the answer to my problem, resolution of time as singles is abysmal!

time precision

3 Comments

  • Brian Powell

    July 14, 2015

    Hi, James. Great post. A couple of things of note that I’d like to add…

    In case it’s not clear to your readers, single precision numbers are great when you don’t need large magnitude values and high precision at the same time. If you were storing small numbers, say a relative time of less than a second, you could easily store with millisecond accuracy. (That is, if I did the math right in my head. 🙂 )

    As you point out, the problem with timestamps is that you have this giant offset between 1904 and now of about 3.5 billion seconds. So, you spend all your precision on the 3.5 billion part, and don’t have any left over for the subsecond resolution.

    The second thing to remind your readers is that the native LabVIEW time stamp format is a 64.64 fixed-point value. When we created it, I wanted to move away from floating point, so that we’d consistently know how much precision we had from second to second, as the number of seconds grew larger. I wanted femto-second resolution now, and I still wanted femto-second resolution 10 years from now, despite the fact that the overall value was 315 million seconds larger.

    If you want to compress timestamps, you might consider storing a t0 as a regular timestamp, and then use a smaller (double- or single-precision) value to store relative time from your t0. This works well for small finite acquisitions. It might not work as well if you’re logging data for long periods of time.

    I’m curious what was driving you to compress timestamps in the first place. Memory is (relatively) cheap, so I’m wondering what the benefit was of using 32-bit vs. 64-bit or 128-bit storage for timestamps. It’d be interesting to hear you explore other solutions to this problem.

    Reply
    • James McNally

      July 14, 2015

      Hi Brian,

      Thanks for the detailed response, it’s interesting to hear the decision behind the current 64.64 timestamp format.

      This is driven by a system of event recorders. There will be 100 in total with each recording locally and forwarding to a server when it has a connection.

      The compression is mainly driven by a desire to record as many events as possible locally in case we have network downtime, these are cRIOs so still only 4GB of storage 🙁 It is also desirable to reduce size on the server as this will be a significant part of the ongoing cost.

      These events unfortunately are not evenly so we must store a timestamp for every data point. I had considered using a t0 for another type of event (which will be waveform based so we don’t have to store multiple timestamps) but I’m not sure why I didn’t look to use it on these events. I think I wasn’t to concerned about the size but this change would half the size of the events (the channel data could be SGL as well) so may be worth revisiting.

      Thanks!

      Reply
  • Yair

    July 14, 2015

    I won’t claim to understand floating point numbers well, but I find that a good rule of thumb for understanding what the resolution will be (which is what your code basically does) is to look at the power of two of where the number is. So if you’re using SGL (which uses 23 bits), then the resolution will jump in powers of two as:

    value resolution
    2^20 – 2^-3
    2^21 – 2^-2
    2^22 – 2^-1

    Leading to a resolution of 1 at 2^23, or ~8 million. At 16 million it will be 2 and so on.

    Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.


By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close