## Signed overflow in C++ and undefined behaviour (UB)

I'm wondering about the use of code like the following

int result = 0; int factor = 1; for (...) { result = ... factor *= 10; } return result;

If the loop is iterated over `n`

times, then `factor`

is multiplied by `10`

exactly `n`

times. However, `factor`

is only ever used after having been multiplied by `10`

a total of `n-1`

times. If we assume that `factor`

never overflows except on the last iteration of the loop, but may overflow on the last iteration of the loop, then should such code be acceptable? In this case, the value of `factor`

would provably never be used after the overflow has happened.

I'm having a debate on whether code like this should be accepted. It would be possible to put the multiplication inside an if-statement and just not do the multiplication on the last iteration of the loop when it can overflow. The downside is that it clutters the code and adds an unnecessary branch that would need to check for on all the previous loop iterations. I could also iterate over the loop one fewer time and replicate the loop body once after the loop, again, this complicates the code.

The actual code in question is used in a tight inner-loop that consumes a large chunk of the total CPU time in a real-time graphics application.

Compilers do assume that a valid C++ program does not contain UB. Consider for example:

if (x == nullptr) { *x = 3; } else { *x = 5; }

If `x == nullptr`

then dereferencing it and assigning a value is UB. Hence the only way this could end in a valid program is when `x == nullptr`

will never yield true and the compiler can assume under the as if rule, the above is equivalent to:

*x = 5;

Now in your code

int result = 0; int factor = 1; for (...) { // Loop until factor overflows but not more result = ... factor *= 10; } return result;

The last multiplication of `factor`

cannot happen in a valid program (signed overflow is undefined). Hence also the assignment to `result`

cannot happen. As there is no way to branch before the last iteration also the previous iteration cannot happen. Eventually, the part of code that is correct (i.e., no undefined behaviour ever happens) is:

// nothing :(

**INT32-C. Ensure that operations on signed integers do ,** Signed integer overflow is undefined behavior 36. Consequently, implementations have considerable latitude in how they deal with signed integer overflow. I was working with integers in C, trying to explore more on when and how overflow happens. I noticed that when I added two positive numbers, the sum of which overflows, I always got a negative num

The behaviour of `int`

overflow is undefined.

It doesn't matter if you read `factor`

outside the loop body; if it has overflowed by then then the behaviour of your code on, after, and somewhat paradoxically **before** the overflow is undefined.

One issue that might arise in keeping this code is that compilers are getting more and more aggressive when it comes to optimisation. In particular they are developing a habit where they assume that undefined behaviour never happens. For this to be the case, they may remove the `for`

loop altogether.

Can't you use an `unsigned`

type for `factor`

although then you'd need to worry about unwanted conversion of `int`

to `unsigned`

in expressions containing both?

**What is signed integer overflow?,** Your while condition will always be true, meaning the loop will run forever, adding 1 to c in each iteration. Since c is a ( signed ) int it means it� This indicates that an overflow has occurred and the signed result represented in two's complement form would not fit in the given number of bits. Definition variations and ambiguity [ edit ] For an unsigned type, when the ideal result of an operation is outside the type's representable range and the returned result is obtained by wrapping

It might be insightful to consider real-world optimizers. Loop unrolling is a known technique. The basic idea op loop unrolling is that

for (int i = 0; i != 3; ++i) foo()

might be better implemented behind the scenes as

foo() foo() foo()

This is the easy case, with a fixed bound. But modern compilers can also do this for variable bounds:

for (int i = 0; i != N; ++i) foo();

becomes

__RELATIVE_JUMP(3-N) foo(); foo(); foo();

Obviously this only works if the compiler knows that N<=3. And that's where we get back to the original question. Because the compiler knows that **signed overflow does not occur**, it knows that the loop can execute a maximum of 9 times on 32 bits architectures. `10^10 > 2^32`

. It can therefore do a 9 iteration loop unroll. **But the intended maximum was 10 iterations !**.

What might happen is that you get a relative jump to a assembly instruction (9-N) with N=10, so an offset of -1, which is the jump instruction itself. Oops. This is a perfectly valid loop optimization for well-defined C++, but the example given turns into a tight infinite loop.

**Integer overflow,** In computer programming, an integer overflow occurs when an arithmetic operation attempts to The overflow flag is set when the result of an operation on signed numbers does not If C is used to convert the floating point value 127.25 to integer, then rounding should be applied first to give an ideal integer output of 127. This happens due to a phenomenon called "signed integer overflow". Let's assume 32-bit int and using two's complement. A signed int will look like this in binary sign bit (0 for positive, 1 for negative) | 31 bits. zero will look like 00000, one like 00001 and so on. Max signed int will look like 011111 (2,147,483,647).

Any signed integer overflow results in undefined behaviour, regardless of whether or not the overflowed value is or might be read.

Maybe in your use-case you can to lift the first iteration out of the loop, turning this

int result = 0; int factor = 1; for (int n = 0; n < 10; ++n) { result += n + factor; factor *= 10; } // factor "is" 10^10 > INT_MAX, UB

into this

int factor = 1; int result = 0 + factor; // first iteration for (int n = 1; n < 10; ++n) { factor *= 10; result += n + factor; } // factor is 10^9 < INT_MAX

With optimization enabled, the compiler might unroll the second loop above into one conditional jump.

**C Language,** C Language Signed integer overflow. Example#. Per paragraph 6.5/5 of both C99 and C11, evaluation of an expression� Signed overflow optimization is just a cheap trick for avoiding the need to do real bounds analysis. Every single loop optimization possible by abusing signed overflow is also possible with unsigned values (and thus implementation-defined signed overflow) if you can prove your bounds.

This is UB; in ISO C++ terms the entire behaviour of the entire program is completely unspecified for an execution that *eventually* hits UB. The classic example is as far as the C++ standard cares, it can make demons fly out of your nose. (I recommend against using an implementation where nasal demons are a real possibility). See other answers for more details.

Compilers can "cause trouble" at compile time for paths of execution they can see leading to compile-time-visible UB, e.g. assume those basic blocks are never reached.

See also What Every C Programmer Should Know About Undefined Behavior (LLVM blog). As explained there, signed-overflow UB lets compilers prove that `for(... i <= n ...)`

loops are not infinite loops, even for unknown `n`

. It also lets them "promote" int loop counters to pointer width instead of redoing sign-extension. (So the consequence of UB in that case could be accessing outside the low 64k or 4G elements of an array, if you were expecting signed wrapping of `i`

into its value range.)

In some cases compilers will emit an illegal instruction like x86 `ud2`

for a block that provably causes UB if ever executed. (Note that a function might *not* ever be called, so compilers can't in general go berserk and break other functions, or even possible paths through a function that don't hit UB. i.e. the machine code it compiles to must still work for all inputs that don't lead to UB.)

**Probably the most efficient solution is to manually peel the last iteration so the unneeded factor*=10 can be avoided.**

int result = 0; int factor = 1; for (... i < n-1) { // stop 1 iteration early result = ... factor *= 10; } result = ... // another copy of the loop body, using the last factor // factor *= 10; // and optimize away this dead operation. return result;

Or if the loop body is large, **consider simply using an unsigned type for factor.** Then you can let the unsigned multiply overflow and it will just do well-defined wrapping to some power of 2 (the number of value bits in the unsigned type).

This is fine even if you use it *with* signed types, especially if your unsigned->signed conversion never overflows.

Conversion between unsigned and 2's complement signed is free (same bit-pattern for all values); the modulo wrapping for int -> unsigned specified by the C++ standard simplifies to just using the same bit-pattern, unlike for one's complement or sign/magnitude.

And unsigned->signed is similarly trivial, although it is implementation-defined for values larger than `INT_MAX`

. If you aren't *using* the huge unsigned result from the last iteration, you have nothing to worry about. But if you are, see Is conversion from unsigned to signed undefined?. The value-doesn't-fit case is *implementation-defined*, which means that an implementation must pick *some* behaviour; sane ones just truncate (if necessary) the unsigned bit pattern and use it as signed, because that works for in-range values the same way with no extra work. And it's definitely not UB. So big unsigned values can become negative signed integers. e.g. after `int x = u;`

gcc and clang don't optimize away `x>=0`

as always being true, even without `-fwrapv`

, because they defined the behaviour.

**Check for Integer Overflow,** Write a “C” function, int addOvf(int* result, int a, int b) If there is no overflow, the function places the resultant = sum a+b in “result” and returns 0. Otherwise it� Write a “C” function, int addOvf(int* result, int a, int b) If there is no overflow, the function places the resultant = sum a+b in “result” and returns 0. Otherwise it returns -1. The solution of casting to long and adding to find detecting the overflow is not allowed. Method 1

**[PDF] Tell Programmers About Signed Integer Overflow Behavior,** Yes, there are quite a number of videos, blogs, and posts about the undefined behavior of signed integer overflow in C++. But the internet is a� So what happens in overflow is that an integer type only has so many bits. integers have 31 bits of magnitude and 1 sign bit. unsigned integers have 32 bits of magnitude. long integers have 63 bits of magnitude and 1 sign bit. unsigned long intege

**Signed overflow check - C++ queries,** However, using signed integers is problematic because signed overflow has undefined behavior according to the C and C++ standards. Overflow: Consider a data type var_t of 1 byte (range is 256): signed var_t a,b; unsigned var_t c,d; If c is 200(11001000) and d is 100(01100100), c+d is 300(00000001 00101100), which is more than the max value 255(11111111). 00000001 00101100 is more than a byte, so the higher byte will be rejected and c+d will be read as 44. So, 200+100=44!

**Signed overflow in C,** If signed overflow were defined, then 12 * i + 8 would have to have correct overflow behaviour for 32-bit values and so this trick couldn't be� Signed integer overflow is undefined behavior 36. Consequently, implementations have considerable latitude in how they deal with signed integer overflow. (See MSC15-C. Do not depend on undefined behavior.) An implementation that defines signed integer types as being modulo, for example, need not detect integer overflow.