When is equality transitive? and other floating point curiosities
Floating point is a giant mess. There are known best practices for most issues that come up in everyday use (e.g., using Kahan summation or adding stochastic noise to reduce aggregate numerical error), but there are still corner cases. Different libraries and implementations give different results because there’s no generally accepted standard.
That’s even true for basic things like transitivity: if a == b
and b == c
, then a == c
. It’s so fundamental that we even assumed it as axiomatic in a number of math classes I’ve taken.
What happens when we check this on our computer using the our handy Haskell REPL?
$ ghci
Prelude>  Hi! I'm a comment!
Prelude>  2^53 == 2.0^53
Prelude> 9007199254740992 == 9007199254740992.0
True
Prelude>  2.0^53 == 2^53 + 1
Prelude> 9007199254740992.0 == 9007199254740993
True
Prelude>  2^53 == 2^53 + 1
Prelude> 9007199254740992 == 9007199254740993
False
That’s weird. Looks like ==
isn’t transitive in Haskell. What happens if we try another language?
$ julia
julia> 9007199254740992 == 9007199254740992.0
true
julia> 9007199254740992.0 == 9007199254740993
false
julia> 9007199254740993 == 9007199254740992
false
Ok, so == is transitive over floats and ints in Julia but not in Haskell. How about Octave/Matlab?
$ octave
octave:1> true
ans = 1
octave:2> false
ans = 0
octave:3> 9007199254740992 == 9007199254740993.0
ans = 1
octave:4> 9007199254740992.0 == 9007199254740993
ans = 1
octave:5> 9007199254740992 == 9007199254740993
ans = 1
That’s pretty weird, different ints seem to be equal. According to the docs, you can explicitly cast to an int type. Let’s try that!
octave:6> uint64(9007199254740992) == uint64(9007199254740993)
ans = 1
octave:7> eq(uint64(9007199254740992),uint64(9007199254740993))
ans = 1
octave:8> 0 == 1
ans = 0
Hmmm.
Let’s try a few more languages to see what we get. In the table below, Transitive represents when the particular operations we tried are transitive, Unique represents when these ints are unique and Allowed represents when ints and floats can be compared with ==
.
 Language  Transitive  Unique  Allowed 
 C gcc 4.6.4  No  Yes  Yes 
 Clojure 1.1  No  Yes  Yes 
 C# Mono 2.10.8.1  No  Yes  Yes 
 Haskell 7.4.1  No  Yes  Yes 
 Java 1.7  No  Yes  Yes 
 perl 5.14.2  No  Yes  Yes 
 Ruby 1.8.7  No  Yes  Yes 
 Scala 2.9.2  No  Yes  Yes 
 Scheme (racket)  No  Yes  Yes 
 Go 1.3  Yes  Yes  Yes 
 Julia 0.3 rc1  Yes  Yes  Yes 
 Lisp SBCL 1.0.55  Yes  Yes  Yes 
 Python 2.7.3  Yes  Yes  Yes 
 Ruby 2.0  Yes  Yes  Yes 
 JavaScript  Yes  No  Yes 
 Lua  Yes  No  Yes 
 Octave 3.6.2  Yes  No  Yes 
 Ocaml 3.12.1      No 
Hmm. After trying a handful of languages, it looks like there are four groups of languages.
 Languages where
==
isn’t transitive over ints and floats, like C and Haskell.  Languages where
==
is transitive over ints and floats (at least on this one example), like Go and Julia.  Languages where
==
is transitive, but only because different ints are equal, like JavaScript^{1}.  Languages where
==
isn’t allowed between ints and floats.
What’s going on here? Well, there are two things that combine to cause this weird behavior.
First, (double precision) floating point numbers are represented by an exponent multiplied by a 52bit fraction. As a more familiar example, let’s think about how that would work with decimal numbers in base 10 for a second. Say we had a floating point decimal with a 2 digit (signed) exponent and a 3 digit fraction. Our number would then be 10^(ee) * 1.(fff)
. So, if our exponent was 01
and our fraction was 000
, we’d get 10^01 * 1.000 = 10.00
. If we were to increment the fraction, to 001
, we’d get 10^01 * 1.001 = 10.01
. If, instead, the exponent were 02
, we’d have 10^02 * 1.000 = 100.0
and 10^02 * 1.001 = 100.1
. Notice that the smallest possible increment is the value of the exponent times the least significant digit in the fraction, i.e., (value of exponent) * 10^(number of digits in fraction)
; in the last example, that works out to be 10^2 * 10^3 = 10^1 = 0.1
.
Instead of base 10, floating point numbers operate in base 2. The limit on the smallest possible change we can make still holds, except that for a 64bit floating point value, we have 11 bits (base2 digits) of exponent, and 52 bits of fraction (plus 1 bit for the sign). This implies that the smallest increment is (value of exponent) * 2^52
. Now, if we set the value of our exponent to be 53
and our fraction to 0
, we get 9007199254740992
. The next representable floating point value is 2^53 * 2^52 = 2
more, 9007199254740994
.
Second, you can’t compare an int and a float with ==
without converting them to the same type. It’s natural to want to convert both to floats and compare, since floats are a more general type; 1.0 + 2 will give you 3.0 in many languages. But if we promote the int to a float, distinct ints can get promoted to the same float, resulting in a violation of transitivity, as we saw in Haskell^{2}. If, instead, we convert floats to ints, the problem goes away, as we saw in Julia.
Well, that problem goes away, at least. But that still leaves a couple other problems. First, we end up with statements like 9007199254740993.0 == 9007199254740993
and int(9007199254740993.0) > 9007199254740992
being false.
Second, what’s supposed to happen when you compare to a value that’s larger than an int? For a 64bit float, our exponent can go up to 2^1023, which is way more than we can represent in a 64bit int (2^63). Languages like Python and Ruby that convert values larger than a machine int into some kind of BigInt by default avoid the issue at the cost of some performance. Languages like Julia, which give you machine ints by default, have to deal with overflow.
On overflow, Julia throws an exception if you try to convert a float that’s bigger than 2^63. However, the ==
comparison always returns false if you compare an int to a float that’s bigger than 2^63. It might seem surprising that int(a) == b
throws an exception while a == b
is false; that’s part of the case for forcing explicit conversions on comparisons, but given that mistyped comparisons are allowed in a language, throwing an exception on conversion and returning the correct result without an exception on comparison is the only way to avoid incorrect results.
And then there are the languages that make all numbers floats, which is fine if you don’t mind that 9007199254740992 == 9007199254740993
. Personally, I mind that 9007199254740992 == 9007199254740993
.
Since there’s no way to handle this well, how about just not allowing the comparison to happen? Of the languages I tried, only OCaml actually considers that to be an error. I like this approach a lot, but people often complain about how explicit you have to be with numeric types in OCaml. Technically, languages like Go and Haskell will also error out if you compare an int variable to a float variable, but if you just type an int in, as in the examples above, they’ll helpfully convert to a float for the comparison.
With the ==
example, it’s possible to maintain consistency by not allowing crosstype comparisons, but there are strange corners in floating point math even if in pure floating point land.
What happens when we try to compute 0/0
or Infinity / Infinity
? We get a special value reserved for values that are not a number, NaN
. NaN
is supposed to fail all comparisons, leading to all kinds of fun stuff, like:
$ ghci
Prelude> nan == nan
False
Prelude> nan < nan  nan == nan  nan > nan
False
One of the justifications for having a special NaN
value is that the NaN
is supposed to propagate, indicating that an error occurred at some point in the calculation.
$ ghci
Prelude> min nan 1
1.0
Prelude> min 1 nan
NaN
But it doesn’t. The NaN
gets suppressed, sometimes, depending on the order of the arguments. This makes sense if you think of min as being implemented by an if/else
comparison.
function min(a,b)
if a < b
return a
else
return b
end
end
Since the comparison always fails, the else clause always wins.
Not all languages do that. Kahan once proposed that NaN
always gets suppressed, matching C’s behavior, treating NaN
as a missing value. Other folks think it’s better to propagate the NaN
to avoid suppressing errors. But, no matter what happens, reasonable people can disagree on whether or not the result makes sense. Some people want min to be a consistent drop in replacement for the naive way of coding it up, and some people want to avoid the possibility of missing an error. Some people even want to suppress the error. I don’t understand that one, but Kahan’s smarter than me and has thought about this more than me, so there’s probably a good reason that I just don’t understand.
We’ve covered Haskell and C. Let’s see what happens in other languages when give min
a NaN
for one argument and 1.0
for the other. The languages we’ve tried do one of the following: always return the 1st argument, always return the 2nd argument, always return a NaN
, always return the nonNaN
, or throw an exception.
 Language  1st  2nd  NaN  nonNaN  Exception 
 Scheme (racket)    X   
 Clojure 1.1   X    
 C# Mono 2.10.8.1    X   
 Ruby 1.8.7    X   X 
 Java 1.7    X   
 Scala 2.9.2    X   
 Haskell 7.4.1   X    
 C gcc 4.6.4     X  
 perl 5.14.2  X     
 Julia 0.3 rc1     X  
 Go 1.3    X   
 Lisp SBCL 1.0.55      X 
 Ruby 2.0      X 
 Python 2.7.3  X     
 JavaScript    X   
 Lua  X     
 Octave 3.6.2    X   
 Ocaml 3.12.1   X    
And then there are the folks who just want whatever’s fastest^{3}, i.e., whatever hardware does. On x86, the SSE/AVX min and max instructions act like a normal if/else
comparison and return the second operand if there’s a NaN
. To get either of the alternatives above (NaN
propagation or NaN
suppression), you need to turn an instruction like
MINPD
into an entire sequence of instructions like
MINPD
PCMPNEQD
BLENDPD
If that’s in your critical path on x86, getting NaN
to propagate is a noticeable performance hit. The equivalent ARM instruction, VPMIN
propagates NaN
, so you’re forced to give up on crossplatform consistency if you want the best possible performance.
If you want to propagate NaN
, this corner case has its own corner case, which is what to do for min(Infinity, NaN)
. The argument for propagating NaN
is the same as before. The case for propagating Infinity
is that that min(Infinity, foo)
should be Infinity
for all foo. This is analogous to a decades old unsettled debate in the hardware design world, Xoptimism vs. Xpessimism. Unlike in hardware design, where Xism is central to verifying the correctness of hardware, NaN
vs. Infinity
is just a corner case of a corner case, so we haven’t had decades of debate, but that doesn’t make the issue any simpler. I don’t want to replay the entire debate, but suffice to say that it’s possible to come up with arbitrarily many examples that make either decision look unreasonable.
To sum it up, different languages pick different, totally reasonable behaviors. And if you want good performance, you’ll get different results on different hardware. And that doesn’t even scratch the surface of floating point weirdness. In addition to the problems mentioned in Goldberg’s classic What Every Programmer Should Know About Floating Point Arithmetic, there are tens or hundreds of weird corner cases that can cause bugs^{4}.
Thanks to Carter kicking off a discussion with Stefan about NaN
comparisons, to Stefan for his comments on NaN
comparisons as well as floating point transitivity, to Erik for comments on some floating point nuances that I missed, and to Alyssa for pointing out some omissions in my comments about Haskell.

This table just refers to this specific case.
==
is famously not transitive in js for all sorts of things. ↩ 
Note that our in between value,
9007199254740993
, will get converted to9007199254740992.0
and not9007199254740994.0
under the default rounding mode on most platforms. This is because the normal default rounding mode is round to nearest, breaking ties by making the last digit even (i.e., 0 and not 1). That, in turn, is because the standard grade school algorithm of rounding ties up introduces a rounding bias. ↩ 
If you read through the IEEE standards meeting notes, you can see that they were pretty concerned about performance: “Roger’s experiments (on Itanium, x87, & SSE/SSE2) suggest that the cost of special handling of +/0 & NaNs is high (3x, 2x, & 12x, respectively)” ↩

Here’s something to think about next time someone proposes we standardize on just highenough precision floating point for all calculations, like in js. What happens when you calculate
log(x+1)
withlog1p(x) = if (1.0 + x == 1.0) else log(1.0 + x) * x / ((1.0 + x)  1.0)
on a machine with extended precision floating point vs. one with single or double precision? ↩