I want to know the first double from 0d upwards that deviates by the long of the "same value" by some delta, say 1e-8. I'm failing here though. I'm trying to do this in C although I usually use managed languages, just in case. Please help.


#define DELTA 1e-8

int main() {
    double d = 0; // checked, the literal is fine
    long i;
    for (i = 0L; i  DELTA || d-i <-DELTA) {
              printf("%f", d);

I'm guessing that the issue is that d-i casts i to double and therefore d==i and then the difference is always 0. How else can I detect this properly -- I'd prefer fun C casting over comparing strings, which would take forever.

我猜测问题是di将我加倍并且因此d == i然后差异总是为0.我还能如何正确地检测到这一点 - 我更喜欢有趣的C铸造而不是比较字符串,这会永远。

ANSWER: is exactly as we expected. 2^53+1 = 9007199254740993 is the first point of difference according to standard C/UNIX/POSIX tools. Thanks much to pax for his program. And I guess mathematics wins again.

答案:完全符合我们的预期。根据标准C / UNIX / POSIX工具,2 ^ 53 + 1 = 9007199254740993是第一个不同点。非常感谢pax对他的节目。我猜数学再次获胜。

Doubles in IEE754 have a precision of 52 bits which means they can store numbers accurately up to (at least) 251.


If your longs are 32-bit, they will only have the (positive) range 0 to 231 so there is no 32-bit long that cannot be represented exactly as a double. For a 64-bit long, it will be (roughly) 252 so I'd be starting around there, not at zero.


You can use the following program to detect where the failures start to occur. An earlier version I had relied on the fact that the last digit in a number that continuously doubles follows the sequence {2,4,8,6}. However, I opted eventually to use a known trusted tool (bc) for checking the whole number, not just the last digit.


Keep in mind that this may be affected by the actions of sprintf() rather than the real accuracy of doubles (I don't think so personally since it had no troubles with certain numbers up to 2143).


This is the program:



int main() {
    FILE *fin;
    double d = 1.0; // 2^n-1 to avoid exact powers of 2.
    int i = 1;
    char ds[1000];
    char tst[1000];

    // Loop forever, rely on break to finish.
    while (1) {
        // Get C version of the double.
        sprintf (ds, "%.0f", d);

        // Get bc version of the double.
        sprintf (tst, "echo '2^%d - 1' | bc >tmpfile", i);
        fin = fopen ("tmpfile", "r");
        fgets (tst, sizeof (tst), fin);
        fclose (fin);
        tst[strlen (tst) - 1] = '\0';

        // Check them.
        if (strcmp (ds, tst) != 0) {
            printf( "2^%d - 1 <-- bc failure\n", i);
            printf( "   got       [%s]\n", ds);
            printf( "   expected  [%s]\n", tst);

        // Output for status then move to next.
        printf( "2^%d - 1 = %s\n", i, ds);
        d = (d + 1) * 2 - 1;  // Again, 2^n - 1.

This keeps going until:


2^51 - 1 = 2251799813685247
2^52 - 1 = 4503599627370495
2^53 - 1 = 9007199254740991
2^54 - 1 <-- bc failure
   got       [18014398509481984]
   expected  [18014398509481983]

which is about where I expected it to fail.


As an aside, I originally used numbers of the form 2n but that got me up to:


2^136 = 87112285931760246646623899502532662132736
2^137 = 174224571863520493293247799005065324265472
2^138 = 348449143727040986586495598010130648530944
2^139 = 696898287454081973172991196020261297061888
2^140 = 1393796574908163946345982392040522594123776
2^141 = 2787593149816327892691964784081045188247552
2^142 = 5575186299632655785383929568162090376495104
2^143 <-- bc failure
   got       [11150372599265311570767859136324180752990210]
   expected  [11150372599265311570767859136324180752990208]

with the size of a double being 8 bytes (checked with sizeof). It turned out these numbers were of the binary form "1000..." which can be represented for far longer with doubles. That's when I switched to using 2n-1 to get a better bit pattern: all one bits.

double的大小是8个字节(用sizeof检查)。事实证明,这些数字是二进制形式“1000 ......”,可以用双倍表示更长的时间。那是当我切换到使用2n-1来获得更好的位模式时:所有的一位。


The first long to be 'wrong' when cast to a double will not be off by 1e-8, it will be off by 1. As long as the double can fit the long in its significand, it will represent it accurately.


I forget exactly how many bits a double has for precision vs offset, but that would tell you the max size it could represent. The first long to be wrong should have the binary form 10000..., so you can find it much quicker by starting at 1 and left-shifting.

我确切地忘记了double对于精度与偏移有多少位,但这会告诉你它可以表示的最大大小。第一个长的错误应该是二进制形式10000 ...,所以你可以通过从1开始和左移来更快地找到它。

Wikipedia says 52 bits in the significand, not counting the implicit starting 1. That should mean the first long to be cast to a different value is 2^53.

维基百科在有效数字中表示52位,不包括隐式起始1.这应该意味着第一个长期被投射到不同的值是2 ^ 53。


Although I'm hesitant to mention Fortran 95 and successors in this discussion, I'll mention that Fortran since the 1990 standard has offered a SPACING intrinsic function which tells you what the difference between representable REALs are about a given REAL. You could do a binary search on this, stopping when SPACING(X) > DELTA. For compilers that use the same floating point model as the one you are interested in (likely to be the IEEE754 standard), you should get the same results.

虽然我在这次讨论中提到Fortran 95和后继者时犹豫不决,但我会提到自1990年标准以来Fortran提供了一个SPACING内在函数,该函数告诉你可表示REAL之间的差异是关于给定的REAL。您可以对此进行二进制搜索,在SPACING(X)> DELTA时停止。对于使用与您感兴趣的浮点模型相同的编译器(可能是IEEE754标准),您应该得到相同的结果。


Off hand, I thought that doubles could represent all integers (within their bounds) exactly.


If that is not the case, then you're going to want to cast both i and d to something with MORE precision than either of them. Perhaps a long double will work.


