热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

大量的乘法,如何捕捉溢出。-multiplicationoflargenumbers,howtocatchoverflow

Iamlookingforanefficient(optionallystandard,elegantandeasytoimplement)solutiontomulti

I am looking for an efficient (optionally standard, elegant and easy to implement) solution to multiply relatively large numbers,and store the result into one or several integers :

我正在寻找一种高效的(可选的、优雅的、易于实现的)解决方案,将相对较大的数字相乘,并将结果存储为一个或几个整数:

Let say I have two 64 bits integers declared like this :

假设我有两个64位整数,像这样声明:

uint64_t a = xxx, b = yyy; 

When I do a*b, how can I detect if the operation result in an overflow and in this case store the carry somewhere ?

当我做一个*b时,我怎么能检测到操作是否导致溢出,并且在这个案例中存储的是某个地方?

Please note that I don't want to use any large-number library since I have constraints on the way I store the numbers.

请注意,我不想使用任何大型库,因为我在存储数字的方式上有限制。

9 个解决方案

#1


59  

Detection:

检测方法:

x = a * b;
if (a != 0 && x / a != b) {
    // overflow handling
}

Edit: Fixed division by 0 (thanks Mark!)

编辑:固定除法0(谢谢马克!)

Computing the carry is quite involved. One approach is to split both operands into half-words, then apply long multiplication to the half-words:

计算进位是相当复杂的。一种方法是将两个操作数拆分为半字,然后将长乘法应用于半字:

uint64_t hi(uint64_t x) {
    return x >> 32;
}

uint64_t lo(uint64_t x) {
    return ((1L <<32) - 1) & x;
}

void multiply(uint64_t a, uint64_t b) {
    // actually uint32_t would do, but the casting is annoying
    uint64_t s0, s1, s2, s3; 

    uint64_t x = lo(a) * lo(b);
    s0 = lo(x);

    x = hi(a) * lo(b) + hi(x);
    s1 = lo(x);
    s2 = hi(x);

    x = s1 + lo(a) * hi(b);
    s1 = lo(x);

    x = s2 + hi(a) * hi(b) + hi(x);
    s2 = lo(x);
    s3 = hi(x);

    uint64_t result = s1 <<32 | s0;
    uint64_t carry = s3 <<32 | s2;
}

To see that none of the partial sums themselves can overflow, we consider the worst case:

我们认为最坏的情况是:没有部分和本身可以溢出。

        x = s2 + hi(a) * hi(b) + hi(x)

Let B = 1 <<32. We then have

让B = 1 <<32。然后,我们有

            x <= (B - 1) + (B - 1)(B - 1) + (B - 1)
              <= B*B - 1
               

I believe this will work - at least it handles Sjlver's test case. Aside from that, it is untested (and might not even compile, as I don't have a C++ compiler at hand anymore).

我相信这是可行的——至少它能处理Sjlver的测试用例。除此之外,它是未经测试的(甚至可能不会编译,因为我手头没有c++编译器)。

#2


23  

The idea is to use following fact which is true for integral operation:

这个想法是用以下事实来做积分运算:

a*b > c if and only if a > c/b

a* b> c如果且仅当> c/b。

/ is integral division here.

/是这里的积分。

The pseudocode to check against overflow for positive numbers follows:

用于检查正数的伪代码如下:

if (a > max_int64 / b) then "overflow" else "ok".

如果(一个> max_int64 / b)然后“溢出”else“ok”。

To handle zeroes and negative numbers you should add more checks.

要处理零和负数,你应该增加更多的检查。

C code for non-negative a and b follows:

非负a和b的C代码如下:

if (b > 0 && a > 18446744073709551615 / b) {
     // overflow handling
}; else {
    c = a * b;
}

Note:

注意:

18446744073709551615 == (1<<64)-1

To calculate carry we can use approach to split number into two 32-digits and multiply them as we do this on the paper. We need to split numbers to avoid overflow.

为了计算进位,我们可以使用方法将数字分割为两个32位数字,并将它们相乘,就像我们在纸上做的那样。我们需要分开数字以避免溢出。

Code follows:

代码如下:

// split input numbers into 32-bit digits
uint64_t a0 = a & ((1LL<<32)-1);
uint64_t a1 = a >> 32;
uint64_t b0 = b & ((1LL<<32)-1);
uint64_t b1 = b >> 32;


// The following 3 lines of code is to calculate the carry of d1
// (d1 - 32-bit second digit of result, and it can be calculated as d1=d11+d12),
// but to avoid overflow.
// Actually rewriting the following 2 lines:
// uint64_t d1 = (a0 * b0 >> 32) + a1 * b0 + a0 * b1;
// uint64_t c1 = d1 >> 32;
uint64_t d11 = a1 * b0 + (a0 * b0 >> 32); 
uint64_t d12 = a0 * b1;
uint64_t c1 = (d11 > 18446744073709551615 - d12) ? 1 : 0;

uint64_t d2 = a1 * b1 + c1;
uint64_t carry = d2; // needed carry stored here

#3


19  

Although there have been several other answers to this question, I several of them have code that is completely untested, and thus far no one has adequately compared the different possible options.

虽然这个问题还有其他几个答案,但我有几个人的代码是完全未经测试的,到目前为止还没有人充分地比较不同的可能选项。

For that reason, I wrote and tested several possible implementations (the last one is based on this code from OpenBSD, discussed on Reddit here). Here's the code:

出于这个原因,我编写并测试了几个可能的实现(最后一个是基于OpenBSD的代码,在Reddit上讨论过)。这是代码:

/* Multiply with overflow checking, emulating clang's builtin function
 *
 *     __builtin_umull_overflow
 *
 * This code benchmarks five possible schemes for doing so.
 */

#include 
#include 
#include 
#include 
#include 

#ifndef BOOL
    #define BOOL int
#endif

// Option 1, check for overflow a wider type
//    - Often fastest and the least code, especially on modern compilers
//    - When long is a 64-bit int, requires compiler support for 128-bits
//      ints (requires GCC >= 3.0 or Clang)

#if LONG_BIT > 32
    typedef __uint128_t long_overflow_t ;
#else
    typedef uint64_t long_overflow_t;
#endif

BOOL 
umull_overflow1(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        long_overflow_t prod = (long_overflow_t)lhs * (long_overflow_t)rhs;
        *result = (unsigned long) prod;
        return (prod >> LONG_BIT) != 0;
}

// Option 2, perform long multiplication using a smaller type
//    - Sometimes the fastest (e.g., when mulitply on longs is a library
//      call).
//    - Performs at most three multiplies, and sometimes only performs one.
//    - Highly portable code; works no matter how many bits unsigned long is

BOOL 
umull_overflow2(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        const unsigned long HALFSIZE_MAX = (1ul <> LONG_BIT/2;
        unsigned long lhs_low  = lhs & HALFSIZE_MAX;
        unsigned long rhs_high = rhs >> LONG_BIT/2;
        unsigned long rhs_low  = rhs & HALFSIZE_MAX;

        unsigned long bot_bits = lhs_low * rhs_low;
        if (!(lhs_high || rhs_high)) {
            *result = bot_bits;
            return 0; 
        }
        BOOL overflowed = lhs_high && rhs_high;
        unsigned long mid_bits1 = lhs_low * rhs_high;
        unsigned long mid_bits2 = lhs_high * rhs_low;

        *result = bot_bits + ((mid_bits1+mid_bits2) <> LONG_BIT/2) != 0
            || (mid_bits2 >> LONG_BIT/2) != 0;
}

// Option 3, perform long multiplication using a smaller type (this code is
// very similar to option 2, but calculates overflow using a different but
// equivalent method).
//    - Sometimes the fastest (e.g., when mulitply on longs is a library
//      call; clang likes this code).
//    - Performs at most three multiplies, and sometimes only performs one.
//    - Highly portable code; works no matter how many bits unsigned long is

BOOL 
umull_overflow3(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
        const unsigned long HALFSIZE_MAX = (1ul <> LONG_BIT/2;
        unsigned long lhs_low  = lhs & HALFSIZE_MAX;
        unsigned long rhs_high = rhs >> LONG_BIT/2;
        unsigned long rhs_low  = rhs & HALFSIZE_MAX;

        unsigned long lowbits = lhs_low * rhs_low;
        if (!(lhs_high || rhs_high)) {
            *result = lowbits;
            return 0; 
        }
        BOOL overflowed = lhs_high && rhs_high;
        unsigned long midbits1 = lhs_low * rhs_high;
        unsigned long midbits2 = lhs_high * rhs_low;
        unsigned long midbits  = midbits1 + midbits2;
        overflowed = overflowed || midbits  HALFSIZE_MAX;
        unsigned long product = lowbits + (midbits < 0 && (SIZE_MAX / rhs) = MUL_NO_OVERFLOW || rhs >= MUL_NO_OVERFLOW) &&
            rhs > 0 && SIZE_MAX / rhs 

Here are the results, testing with various compilers and systems I have (in this case, all testing was done on OS X, but results should be similar on BSD or Linux systems):

这里是测试结果,测试使用了不同的编译器和系统(在本例中,所有测试都是在OS X上完成的,但是在BSD或Linux系统上的结果应该是类似的):

+------------------+----------+----------+----------+----------+----------+
|                  | Option 1 | Option 2 | Option 3 | Option 4 | Option 5 |
|                  |  BigInt  | LngMult1 | LngMult2 |   Div    |  OptDiv  |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 i386   |    1.610 |    3.217 |    3.129 |    4.405 |    4.398 |
| GCC 4.9.0 i386   |    1.488 |    3.469 |    5.853 |    4.704 |    4.712 |
| GCC 4.2.1 i386   |    2.842 |    4.022 |    3.629 |    4.160 |    4.696 |
| GCC 4.2.1 PPC32  |    8.227 |    7.756 |    7.242 |   20.632 |   20.481 |
| GCC 3.3   PPC32  |    5.684 |    9.804 |   11.525 |   21.734 |   22.517 |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 x86_64 |    1.584 |    2.472 |    2.449 |    9.246 |    7.280 |
| GCC 4.9 x86_64   |    1.414 |    2.623 |    4.327 |    9.047 |    7.538 |
| GCC 4.2.1 x86_64 |    2.143 |    2.618 |    2.750 |    9.510 |    7.389 |
| GCC 4.2.1 PPC64  |   13.178 |    8.994 |    8.567 |   37.504 |   29.851 |
+------------------+----------+----------+----------+----------+----------+

Based on these results, we can draw a few conclusions:

基于这些结果,我们可以得出以下结论:

  • Clearly, the division-based approach, although simple and portable, is slow.
  • 显然,基于分区的方法虽然简单且便于携带,但是速度很慢。
  • No technique is a clear winner in all cases.
  • 在所有情况下,任何技术都不是赢家。
  • On modern compilers, the use-a-larger-int approach is best, if you can use it
  • 在现代编译器中,如果您可以使用,则使用-大-整数方法是最好的方法。
  • On older compilers, the long-multiplication approach is best
  • 在旧的编译器中,长乘法的方法是最好的。
  • Surprisingly, GCC 4.9.0 has performance regressions over GCC 4.2.1, and GCC 4.2.1 has performance regressions over GCC 3.3
  • 令人惊讶的是,GCC 4.9.0对GCC 4.2.1有性能回归,而GCC 4.2.1有性能回归问题。

#4


8  

A version that also works when a == 0:

当A == 0时也可以使用的版本:

    x = a * b;
    if (a != 0 && x / a != b) {
        // overflow handling
    }

#5


6  

If you need not just to detect overflow but also to capture the carry, you're best off breaking your numbers down into 32-bit parts. The code is a nightmare; what follows is just a sketch:

如果您不仅需要检测溢出,还需要捕捉进位,那么最好将您的数据分解为32位的部分。代码是一场噩梦;接下来的只是一个草图:

#include 

uint64_t mul(uint64_t a, uint64_t b) {
  uint32_t ah = a >> 32;
  uint32_t al = a;  // truncates: now a = al + 2**32 * ah
  uint32_t bh = b >> 32;
  uint32_t bl = b;  // truncates: now b = bl + 2**32 * bh
  // a * b = 2**64 * ah * bh + 2**32 * (ah * bl + bh * al) + al * bl
  uint64_t partial = (uint64_t) al * (uint64_t) bl;
  uint64_t mid1    = (uint64_t) ah * (uint64_t) bl;
  uint64_t mid2    = (uint64_t) al * (uint64_t) bh;
  uint64_t carry   = (uint64_t) ah * (uint64_t) bh;
  // add high parts of mid1 and mid2 to carry
  // add low parts of mid1 and mid2 to partial, carrying
  //    any carry bits into carry...
}

The problem is not just the partial products but the fact that any of the sums can overflow.

问题不只是部分产品,而是任何一个和都可以溢出。

If I had to do this for real, I would write an extended-multiply routine in the local assembly language. That is, for example, multiply two 64-bit integers to get a 128-bit result, which is stored in two 64-bit registers. All reasonable hardware provides this functionality in a single native multiply instruction—it's not just accessible from C.

如果我必须这样做,我将在本地汇编语言中编写一个扩展乘法例程。例如,将两个64位整数相乘得到一个128位的结果,该结果存储在两个64位寄存器中。所有合理的硬件都以单一的本地乘法指令来提供此功能——它不仅可以从C访问。

This is one of those rare cases where the solution that's most elegant and easy to program is actually to use assembly language. But it's certainly not portable :-(

这是一种罕见的情况,其中最优雅、最容易编程的解决方案实际上是使用汇编语言。但它肯定不是便携的:-(

#6


1  

I've been working with this problem this days and I have to say that it has impressed me the number of times I have seen people saying the best way to know if there has been an overflow is to divide the result, thats totally inefficient and unnecessary. The point for this function is that it must be as fast as possible.

我这几天一直在研究这个问题,我不得不说它给我留下了深刻的印象,我看到人们说,最好的方法是知道是否有溢出来划分结果,这是完全没有效率的,没有必要的。这个函数的重点是它必须尽可能快。

There are two options for the overflow detection:

溢出检测有两种选择:

1º- If possible create the result variable twice as big as the multipliers, for example:

1 .如果可能的话,创建一个比乘数大两倍的结果变量,例如:

struct INT32struct {INT16 high, low;};
typedef union
{
  struct INT32struct s;
  INT32 ll;
} INT32union;

INT16 mulFunction(INT16 a, INT16 b)
{
  INT32union result.ll = a * b; //32Bits result
  if(result.s.high > 0) 
      Overflow();
  return (result.s.low)
}

You will know inmediately if there has been an overflow, and the code is the fastest possible without writing it in machine code. Depending on the compiler this code can be improved in machine code.

如果出现溢出,您将立即知道,并且代码是最快的,而不是在机器代码中编写。根据编译器的不同,这段代码可以在机器代码中得到改进。

2º- Is impossible to create a result variable twice as big as the multipliers variable: Then you should play with if conditions to determine the best path. Continuing with the example:

2º-是不可能创建一个结果变量两倍乘数变量:那么你应该玩如果条件来决定最好的路径。继续这个示例:

INT32 mulFunction(INT32 a, INT32 b)
{

  INT32union s_a.ll = abs(a);
  INT32union s_b.ll = abs(b); //32Bits result
  INT32union result;
  if(s_a.s.hi > 0 && s_b.s.hi > 0)
  {
      Overflow();
  }
  else if (s_a.s.hi > 0)
  {
      INT32union res1.ll = s_a.s.hi * s_b.s.lo;
      INT32union res2.ll = s_a.s.lo * s_b.s.lo;
      if (res1.hi == 0)
      {
          result.s.lo = res1.s.lo + res2.s.hi;
          if (result.s.hi == 0)
          {
            result.s.ll = result.s.lo <<16 + res2.s.lo;
            if ((a.s.hi >> 15) ^ (b.s.hi >> 15) == 1)
            {
                result.s.ll = -result.s.ll; 
            }
            return result.s.ll
          }else
          {
             Overflow();
          }
      }else
      {
          Overflow();
      }
  }else if (s_b.s.hi > 0)
{

   //Same code changing a with b

}else 
{
    return (s_a.lo * s_b.lo);
}
}

I hope this code helps you to have a quite efficient program and I hope the code is clear, if not I'll put some coments.

我希望这段代码能帮助您实现一个非常高效的程序,我希望代码是清晰的,如果不是的话,我将会添加一些内容。

best regards.

致以最亲切的问候。

#7


0  

Perhaps the best way to solve this problem is to have a function, which multiplies two UInt64 and results a pair of UInt64, an upper part and a lower part of the UInt128 result. Here is the solution, including a function, which displays the result in hex. I guess you perhaps prefer a C++ solution, but I have a working Swift-Solution which shows, how to manage the problem:

解决这个问题的最好方法也许是拥有一个函数,它可以将两个UInt64相乘,结果是UInt64,一个上半部分和UInt128结果的一个下半部分。这里是解决方案,包括一个函数,它显示了十六进制的结果。我想你可能更喜欢c++的解决方案,但是我有一个工作的快速解决方案,它显示了如何管理这个问题:

func hex128 (_ hi: UInt64, _ lo: UInt64) -> String
{
    var s: String = String(format: "%08X", hi >> 32)
                  + String(format: "%08X", hi & 0xFFFFFFFF)
                  + String(format: "%08X", lo >> 32)
                  + String(format: "%08X", lo & 0xFFFFFFFF)
    return (s)
}

func mul64to128 (_ multiplier: UInt64, _ multiplicand : UInt64)
             -> (result_hi: UInt64, result_lo: UInt64)
{
    let x: UInt64 = multiplier
    let x_lo: UInt64 = (x & 0xffffffff)
    let x_hi: UInt64 = x >> 32

    let y: UInt64 = multiplicand
    let y_lo: UInt64 = (y & 0xffffffff)
    let y_hi: UInt64 = y >> 32

    let mul_lo: UInt64 = (x_lo * y_lo)
    let mul_hi: UInt64 = (x_hi * y_lo) + (mul_lo >> 32)
    let mul_carry: UInt64 = (x_lo * y_hi) + (mul_hi & 0xffffffff)
    let result_hi: UInt64 = (x_hi * y_hi) + (mul_hi >> 32) + (mul_carry >> 32)
    let result_lo: UInt64 = (mul_carry <<32) + (mul_lo & 0xffffffff)

    return (result_hi, result_lo)
}

Here is an example to verify, that the function works:

这里有一个例子来验证,这个函数是工作的:

var c: UInt64 = 0
var d: UInt64 = 0

(c, d) = mul64to128(0x1234567890123456, 0x9876543210987654)
// 0AD77D742CE3C72E45FD10D81D28D038 is the result of the above example
print(hex128(c, d))

(c, d) = mul64to128(0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF)
// FFFFFFFFFFFFFFFE0000000000000001 is the result of the above example
print(hex128(c, d))

#8


0  

Here is a trick for detecting whether multiplication of two unsigned integers overflows.

这里有一个技巧,用来检测两个无符号整数的乘法是否会溢出。

We make the observation that if we multiply an N-bit-wide binary number with an M-bit-wide binary number, the product does not have more than N + M bits.

我们观察到,如果我们将一个N位宽的二进制数乘以一个M-位宽的二进制数,这个乘积就不会有超过N + M位。

For instance, if we are asked to multiply a three-bit number with a twenty-nine bit number, we know that this doesn't overflow thirty-two bits.

例如,如果要求我们将一个3位的数字与一个29位的数字相乘,我们知道这不会溢出32位。

#include 
#include 

int might_be_mul_oflow(unsigned long a, unsigned long b)
{
  if (!a || !b)
    return 0;

  a = a | (a >> 1) | (a >> 2) | (a >> 4) | (a >> 8) | (a >> 16) | (a >> 32);
  b = b | (b >> 1) | (b >> 2) | (b >> 4) | (b >> 8) | (b >> 16) | (b >> 32);

  for (;;) {
    unsigned long na = a <<1;
    if (na <= a)
      break;
    a = na;
  }

  return (a & b) ? 1 : 0;
}

int main(int argc, char **argv)
{
  unsigned long a, b;
  char *endptr;

  if (argc <3) {
    printf("supply two unsigned long integers in C form\n");
    return EXIT_FAILURE;
  }

  a = strtoul(argv[1], &endptr, 0);

  if (*endptr != 0) {
    printf("%s is garbage\n", argv[1]);
    return EXIT_FAILURE;
  }

  b = strtoul(argv[2], &endptr, 0);

  if (*endptr != 0) {
    printf("%s is garbage\n", argv[2]);
    return EXIT_FAILURE;
  }

  if (might_be_mul_oflow(a, b))
    printf("might be multiplication overflow\n");

  {
    unsigned long c = a * b;
    printf("%lu * %lu = %lu\n", a, b, c);
    if (a != 0 && c / a != b)
      printf("confirmed multiplication overflow\n");
  }

  return 0;
}

A smattering of tests: (on 64 bit system):

少量测试:(64位系统):

$ ./uflow 0x3 0x3FFFFFFFFFFFFFFF
3 * 4611686018427387903 = 13835058055282163709

$ ./uflow 0x7 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
7 * 4611686018427387903 = 13835058055282163705
confirmed multiplication overflow

$ ./uflow 0x4 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
4 * 4611686018427387903 = 18446744073709551612

$ ./uflow 0x5 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
5 * 4611686018427387903 = 4611686018427387899
confirmed multiplication overflow

The steps in might_be_mul_oflow are almost certainly slower than just doing the division test, at least on mainstream processors used in desktop workstations, servers and mobile devices. On chips without good division support, it could be useful.

might_be_mul_oflow中的步骤几乎肯定比仅仅进行划分测试要慢,至少在桌面工作站、服务器和移动设备中使用的主流处理器上是如此。在没有良好部门支持的芯片上,它可能是有用的。


It occurs to me that there is another way to do this early rejection test.

我突然想到还有另外一种方法来做这个早期的拒绝测试。

  1. We start with a pair of numbers arng and brng which are initialized to 0x7FFF...FFFF and 1.

    我们从一对数字arng和brng开始,它们被初始化为0x7FFF…飞行符和1。

  2. If a <= arng and b <= brng we can conclude that there is no overflow.

    如果a <= arng和b <= brng,我们可以得出结论:没有溢出。

  3. Otherwise, we shift arng to the right, and shift brng to the left, adding one bit to brng, so that they are 0x3FFF...FFFF and 3.

    否则,我们将arng移到右边,将brng移到左边,向brng中添加一点,以便它们是0x3FFF…飞行符和3。

  4. If arng is zero, finish; otherwise repeat at 2.

    如果arng为零,完成;否则重复在2。

The function now looks like:

这个函数现在看起来是这样的:

int might_be_mul_oflow(unsigned long a, unsigned long b)
{
  if (!a || !b)
    return 0;

  {
    unsigned long arng = ULONG_MAX >> 1;
    unsigned long brng = 1;

    while (arng != 0) {
      if (a <= arng && b <= brng)
        return 0;
      arng >>= 1;
      brng <<= 1;
      brng |= 1;
    }

    return 1;
  }
}

#9


0  

If you just want to detect overflow, how about converting to double, doing the multiplication and if

如果你只是想检测溢出,那么转换为double,做乘法和If。

|x| <2^53, convert to int64

| |

|x| <2^63, make the multiplication using int64

| |

otherwise produce whatever error you want?

否则会产生任何你想要的错误?

This seems to work:

这似乎工作:

int64_t safemult(int64_t a, int64_t b) {
  double dx;

  dx = (double)a * (double)b;

  if ( fabs(dx) <(double)9007199254740992 )
    return (int64_t)dx;

  if ( (double)INT64_MAX 

推荐阅读
author-avatar
墨镜小靖
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有