Is multiplication and division using shift operators in C actually faster?

division using bitwise operators in c
division using bit shift
c arithmetic shift
division using shift register
assembly division using shift
divide by 3 using shift
how to divide 2 numbers without using division operator in c
multiply using bitwise operators

Multiplication and division can be achieved using bit operators, for example

i*2 = i<<1
i*3 = (i<<1) + i;
i*10 = (i<<3) + (i<<1)

and so on.

Is it actually faster to use say (i<<3)+(i<<1) to multiply with 10 than using i*10 directly? Is there any sort of input that can't be multiplied or divided in this way?

Short answer: Not likely.

Long answer: Your compiler has an optimizer in it that knows how to multiply as quickly as your target processor architecture is capable. Your best bet is to tell the compiler your intent clearly (i.e. i*2 rather than i << 1) and let it decide what the fastest assembly/machine code sequence is. It's even possible that the processor itself has implemented the multiply instruction as a sequence of shifts & adds in microcode.

Bottom line--don't spend a lot of time worrying about this. If you mean to shift, shift. If you mean to multiply, multiply. Do what is semantically clearest--your coworkers will thank you later. Or, more likely, curse you later if you do otherwise.

Speeds of << >> multiplication and division, In computer programming, an arithmetic shift is a shift operator, sometimes termed a signed For binary numbers it is a bitwise operation that shifts all of the bits of its operand; Logical left shifts are also equivalent, except multiplication and arithmetic On most processors, shift instructions will execute faster than division� Is multiplication and division using shift operators in C actually faster? (11) Multiplication and division can be achieved using bit operators, for example. i*2 = i<<1 i*3 = (i<<1) + i; i*10 = (i<<3) + (i<<1) and so on. Is it actually faster to use say (i<<3)+(i<<1) to multiply with 10 than using i*10 directly? Is there any sort of input that

Just a concrete point of measure: many years back, I benchmarked two versions of my hashing algorithm:

unsigned
hash( char const* s )
{
    unsigned h = 0;
    while ( *s != '\0' ) {
        h = 127 * h + (unsigned char)*s;
        ++ s;
    }
    return h;
}

and

unsigned
hash( char const* s )
{
    unsigned h = 0;
    while ( *s != '\0' ) {
        h = (h << 7) - h + (unsigned char)*s;
        ++ s;
    }
    return h;
}

On every machine I benchmarked it on, the first was at least as fast as the second. Somewhat surprisingly, it was sometimes faster (e.g. on a Sun Sparc). When the hardware didn't support fast multiplication (and most didn't back then), the compiler would convert the multiplication into the appropriate combinations of shifts and add/sub. And because it knew the final goal, it could sometimes do so in less instructions than when you explicitly wrote the shifts and the add/subs.

Note that this was something like 15 years ago. Hopefully, compilers have only gotten better since then, so you can pretty much count on the compiler doing the right thing, probably better than you could. (Also, the reason the code looks so C'ish is because it was over 15 years ago. I'd obviously use std::string and iterators today.)

Arithmetic shift, But are these alternatives actually any faster? Today's article puts them to the test ! Here are today's competitors: // Multiplication i� Is multiplication and division using shift operators in C actually faster? Why are elementwise additions much faster in separate loops than in a combined loop? x86 assembly multiply and divide instruction operands, 16-bit and higher ; Divide a number by 3 without using*,/,+,-, % operators

In addition to all the other good answers here, let me point out another reason to not use shift when you mean divide or multiply. I have never once seen someone introduce a bug by forgetting the relative precedence of multiplication and addition. I have seen bugs introduced when maintenance programmers forgot that "multiplying" via a shift is logically a multiplication but not syntactically of the same precedence as multiplication. x * 2 + z and x << 1 + z are very different!

If you're working on numbers then use arithmetic operators like + - * / %. If you're working on arrays of bits, use bit twiddling operators like & ^ | >> . Don't mix them; an expression that has both bit twiddling and arithmetic is a bug waiting to happen.

Bitwise Alternatives to Multiply, Divide, and Modulus: Faster , Bitwise operators are NEITHER slow NOR CPU-inefficient. In fact, they are often faster than the basic arithmetic instructions like addition, multiplication, division etc. So, in the early days of C programming, when compilers were simpler beasts, Is there any faster way to check if a number is in the power of two? I know this� Some authors prefer the terms sticky right-shift and zero-fill right-shift for arithmetic and logical shifts respectively. Arithmetic shifts can be useful as efficient ways to perform multiplication or division of signed integers by powers of two. Shifting left by n bits on a signed or unsigned binary number has the effect of multiplying it by 2 n.

This depends on the processor and the compiler. Some compilers already optimize code this way, others don't. So you need to check each time your code needs to be optimized this way.

Unless you desperately need to optimize, I would not scramble my source code just to save an assembly instruction or processor cycle.

Why are bitwise operations slow and CPU-inefficient?, Though, every optimizing compiler is already able to choose the fastest instructions for the target processor, and so some techniques are useless with some� Left Shift and Right Shift Operators in C/C++; Bitwise right shift operators in Java; Multiplication with a power of 2; Booth’s Multiplication Algorithm; Calculate 7n/8 without using division and multiplication operators; Check if left and right shift of any string results into given string; Find XOR of two number without using XOR operator

Is it actually faster to use say (i<<3)+(i<<1) to multiply with 10 than using i*10 directly?

It might or might not be on your machine - if you care, measure in your real-world usage.

A case study - from 486 to core i7

Benchmarking is very difficult to do meaningfully, but we can look at a few facts. From http://www.penguin.cz/~literakl/intel/s.html#SAL and http://www.penguin.cz/~literakl/intel/i.html#IMUL we get an idea of x86 clock cycles needed for arithmetic shift and multiplication. Say we stick to "486" (the newest one listed), 32 bit registers and immediates, IMUL takes 13-42 cycles and IDIV 44. Each SAL takes 2, and adding 1, so even with a few of those together shifting superficially looks like a winner.

These days, with the core i7:

(from http://software.intel.com/en-us/forums/showthread.php?t=61481)

The latency is 1 cycle for an integer addition and 3 cycles for an integer multiplication. You can find the latencies and thoughput in Appendix C of the "Intel® 64 and IA-32 Architectures Optimization Reference Manual", which is located on http://www.intel.com/products/processor/manuals/.

(from some Intel blurb)

Using SSE, the Core i7 can issue simultaneous add and multiply instructions, resulting in a peak rate of 8 floating-point operations (FLOP) per clock cycle

That gives you an idea of how far things have come. The optimisation trivia - like bit shifting versus * - that was been taken seriously even into the 90s is just obsolete now. Bit-shifting is still faster, but for non-power-of-two mul/div by the time you do all your shifts and add the results it's slower again. Then, more instructions means more cache faults, more potential issues in pipelining, more use of temporary registers may mean more saving and restoring of register content from the stack... it quickly gets too complicated to quantify all the impacts definitively but they're predominantly negative.

functionality in source code vs implementation

More generally, your question is tagged C and C++. As 3rd generation languages, they're specifically designed to hide the details of the underlying CPU instruction set. To satisfy their language Standards, they must support multiplication and shifting operations (and many others) even if the underlying hardware doesn't. In such cases, they must synthesize the required result using many other instructions. Similarly, they must provide software support for floating point operations if the CPU lacks it and there's no FPU. Modern CPUs all support * and <<, so this might seem absurdly theoretical and historical, but the significance thing is that the freedom to choose implementation goes both ways: even if the CPU has an instruction that implements the operation requested in the source code in the general case, the compiler's free to choose something else that it prefers because it's better for the specific case the compiler's faced with.

Examples (with a hypothetical assembly language)

source           literal approach         optimised approach
#define N 0
int x;           .word x                xor registerA, registerA
x *= N;          move x -> registerA
                 move x -> registerB
                 A = B * immediate(0)
                 store registerA -> x
  ...............do something more with x...............

Instructions like exclusive or (xor) have no relationship to the source code, but xor-ing anything with itself clears all the bits, so it can be used to set something to 0. Source code that implies memory addresses may not entail any being used.

These kind of hacks have been used for as long as computers have been around. In the early days of 3GLs, to secure developer uptake the compiler output had to satisfy the existing hardcore hand-optimising assembly-language dev. community that the produced code wasn't slower, more verbose or otherwise worse. Compilers quickly adopted lots of great optimisations - they became a better centralised store of it than any individual assembly language programmer could possibly be, though there's always the chance that they miss a specific optimisation that happens to be crucial in a specific case - humans can sometimes nut it out and grope for something better while compilers just do as they've been told until someone feeds that experience back into them.

So, even if shifting and adding is still faster on some particular hardware, then the compiler writer's likely to have worked out exactly when it's both safe and beneficial.

Maintainability

If your hardware changes you can recompile and it'll look at the target CPU and make another best choice, whereas you're unlikely to ever want to revisit your "optimisations" or list which compilation environments should use multiplication and which should shift. Think of all the non-power-of-two bit-shifted "optimisations" written 10+ years ago that are now slowing down the code they're in as it runs on modern processors...!

Thankfully, good compilers like GCC can typically replace a series of bitshifts and arithmetic with a direct multiplication when any optimisation is enabled (i.e. ...main(...) { return (argc << 4) + (argc << 2) + argc; } -> imull $21, 8(%ebp), %eax) so a recompilation may help even without fixing the code, but that's not guaranteed.

Strange bitshifting code implementing multiplication or division is far less expressive of what you were conceptually trying to achieve, so other developers will be confused by that, and a confused programmer's more likely to introduce bugs or remove something essential in an effort to restore seeming sanity. If you only do non-obvious things when they're really tangibly beneficial, and then document them well (but don't document other stuff that's intuitive anyway), everyone will be happier.

General solutions versus partial solutions

If you have some extra knowledge, such as that your int will really only be storing values x, y and z, then you may be able to work out some instructions that work for those values and get you your result more quickly than when the compiler's doesn't have that insight and needs an implementation that works for all int values. For example, consider your question:

Multiplication and division can be achieved using bit operators...

You illustrate multiplication, but how about division?

int x;
x >> 1;   // divide by 2?

According to the C++ Standard 5.8:

-3- The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity 2 raised to the power E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.

So, your bit shift has an implementation defined result when x is negative: it may not work the same way on different machines. But, / works far more predictably. (It may not be perfectly consistent either, as different machines may have different representations of negative numbers, and hence different ranges even when there are the same number of bits making up the representation.)

You may say "I don't care... that int is storing the age of the employee, it can never be negative". If you have that kind of special insight, then yes - your >> safe optimisation might be passed over by the compiler unless you explicitly do it in your code. But, it's risky and rarely useful as much of the time you won't have this kind of insight, and other programmers working on the same code won't know that you've bet the house on some unusual expectations of the data you'll be handling... what seems a totally safe change to them might backfire because of your "optimisation".

Is there any sort of input that can't be multiplied or divided in this way?

Yes... as mentioned above, negative numbers have implementation defined behaviour when "divided" by bit-shifting.

Optimizing C++/Code optimization/Faster operations, Many programs (e.g. systems type applications) must actually operate at a low NOTE: The combination of pointers and bit-level operators makes C useful for NOTE: Shifting is much faster than actual multiplication (*) or division (/) by 2. Efficient Approach : Use bit manipulation in order to find the quotient.The divisor and dividend can be written as. dividend = quotient * divisor + remainder. As every number can be represented in base 2(0 or 1), represent the quotient in binary form by using shift operator as given below :

Low Level Operators and Bit Fields, c. 0101. + 0011. ------. 0. Step 2: Add the carry plus the bits in bit position one ( carry + 0 + 1 = 0 + Like multiplication of binary numbers, binary division is actually easier than You can use the bitwise AND operator to test individual bits in a bit string to using the AND operator is quite a bit faster than using an if statement. donc en faisant un décalage plutôt qu'une multiplication/division par une puissance de deux en python, il y a une légère amélioration (~10% pour la division; ~1% pour la multiplication). Si c'est une non-puissance de deux, il y a probablement un ralentissement considérable.

[PDF] Binary Arithmetic and Bit Operations, A comprehensive tutorial on bit manipulations and bitwise operators in C and C++ compressed in this manner, you need to actually extract data at the bit level. Note that a bitwise right-shift will be the equivalent of integer division by 2 . in significantly faster code than calculating and then multiplying by a power of two. The existing answers didn't really address the hardware side of things, so here's a bit on that angle. The conventional wisdom is that multiplication and division are much slower than shifting, but the actual story today is more nuanced.

Bitwise Operators in C and C++, a = b + c. a+=1. a++. Is the last operation the fastest? Since as I know in On some legacy RISC architectures bit shifts may be slower than multiplication. of power, so you might actually be better off with ADD in some cases. Often a multiply or divide can be decomposed to a series of shifts and adds, and if that series of operations will be faster than the multiply or divide, the compiler will use it. For division by a constant, the compiler can often convert the operation to a multiply by a 'magic number' followed by a shift. This can be a major clock-cycle saver

Comments
  • Actually, cheap division by a constant other than a power of two is possible, but a tricky subjet to which you are not doing justice with "/Division … /divided" in your question. See for instance hackersdelight.org/divcMore.pdf (or get the book "Hacker's delight" if you can).
  • It sound like something that could easily be tested.
  • As usual - it depends. Once upon a time I tried this in assembler on an Intel 8088 (IBM PC/XT) where a multiplication took a bazillion clocks. Shifts and adds executed a lot faster, so it seemed like a good idea. However, while multiplying the bus unit was free to fill the instruction queue and the next instruction could then start immediately. After a series of shifts and adds the instruction queue would be empty and the CPU would have to wait for the next instruction to be fetched from memory (one byte at a time!). Measure, measure, measure!
  • Also, beware that right-shifting is only well-defined for unsigned integers. If you have a signed integer, it's not defined whether 0 or the highest bit are padded from the left. (And don't forget the time it takes for someone else (even yourself) to read the code a year later!)
  • Actually, a good optimizing compiler will implement multiplication and division with shifts when they are faster.
  • Yep, as said the possible gains for almost every application will totally outweigh the obscurity introduced. Don't worry about this kind of optimisation prematurely. Build what is sematically clear, identify bottlenecks and optimise from there...
  • Agreed, optimizing for readability and maintainability will probably net you more time to spend actually optimizing things that the profiler says are hot code paths.
  • These comments make it sound like you're giving up on potential performance from telling the compiler how to do its job. This is not the case. You actually get better code from gcc -O3 on x86 with return i*10 than from the shift version. As someone who looks at compiler output a lot (see many of my asm / optimization answers), I'm not suprised. There are times when it can help to hand-hold the compiler into one way of doing things, but this isn't one of them. gcc is good at integer math, because it's important.
  • Just downloaded an arduino sketch that has millis() >> 2; Would it have been too much to ask to just divide?
  • I tested i / 32 vs i >> 5 and i / 4 vs i >> 2 on gcc for cortex-a9 (which has no hardware division) with optimisation -O3 and the resulting assembly was exactly the same. I didn't like using divisions first but it describes my intention and the output is the same.
  • You may be interested in the following blog post, in which the author notes that modern optimizing compilers seem to reverse-engineer common patterns that programmers might use thinking them more efficient into their mathematical forms so as to really generate the most efficient instruction sequence for them. shape-of-code.coding-guidelines.com/2009/06/30/…
  • @PascalCuoq Nothing really new about this. I discovered pretty much the same thing for Sun CC close to 20 years ago.