Why are log2 and log1p so much faster than log and log10, in numpy?
when to use log1p
negative log python
In : %%timeit x = np.random.rand(100000) ....: np.log2(x) ....: 1000 loops, best of 3: 1.31 ms per loop In : %%timeit x = np.random.rand(100000) np.log(x) ....: 100 loops, best of 3: 3.64 ms per loop In : %%timeit x = np.random.rand(100000) np.log10(x) ....: 100 loops, best of 3: 3.93 ms per loop
np.log2 is about 3x faster than
np.log10. Perhaps even more counter-intuitively,
np.log1p(x), which computes ln(x + 1), is on par with
In : %%timeit x = np.random.rand(100000) np.log1p(x) ....: 1000 loops, best of 3: 1.46 ms per loop
I obtained almost identical timings in numpy v1.10.1 and v1.8.2.
Is there an intuitive explanation for these discrepancies in runtime performance?
I found your question extremely interesting, so I spent a few hours doing a bit of reserach; I think I have found an explanation for the performance difference as it applies to integers (thank you Matteo Italia for your note) - It is unclear how that reasoning can be extended to floats though:
Computers use base 2 - According to the articles linked in reference, the computation of log2 is a 4 processor cycles process - that of log10 requires to multiply log2(val) by 1/log2(10) which adds another 5 cycles.
Finding log2 is a matter of finding the index of the least significant bit of a value. (video at about the 23rd minute mark onward).
bit hacks: Find integer log base 10 of an integer
The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship log10(v) = log2(v) / log2(10), we need to multiply it by 1/log2(10), which is approximately 1233/4096, or 1233 followed by a right shift of 12. Adding one is needed because the IntegerLogBase2 rounds down. Finally, since the value t is only an approximation that may be off by one, the exact value is found by subtracting the result of v < PowersOf10[t].
This method takes 6 more operations than IntegerLogBase2. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for t (that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of v).
Of note: the use of DeBruijn sequences lookup & bit shifting techniques to calculate log2 in this MIT video: Lec 2 | MIT 6.172 Performance Engineering of Software Systems, Fall 2010 (video at the 36th minute mark onward).
Of note this StackOverflow post which demonstrates a method to make efficient log2 calculations truly cross platform with C++
Caveat: I have not checked numpy source code to verify that it indeed implements similar techniques, but it would be surprising that it did not. In fact, from the comments under the OP's post, Fermion Portal have checked:
Actually numpy uses math.h from glibc, you'll see the same difference in C/C++ if you use math.h/cmath.h. You can find the nicely commented source code for the two functions e.g. ic.unicamp.br/~islene/2s2008-mo806/libc/sysdeps/ieee754/dbl-64/… and ic.unicamp.br/~islene/2s2008-mo806/libc/sysdeps/ieee754/dbl-64/… – Fermion Portal
`log2` way slower than `log` or `log10` · Issue #13836 · numpy , Theoretically it should not be slower than np.log10 anyway Related SO: Why are log2 and log1p so much faster than log and log10? Looks like the performance of log and log10 is consistent between numpy and my implementation, but log2 is way slower than them. Maybe there isn't a good enough SIMD for that? But anyway the performance of np.log2 is indeed slower than expected.
This is just a note, but longer than a comment. Apparently this has to do with your particular install:
import numpy as np import numexpr as ne x = np.random.rand(100000)
I get the same timings with numpy 1.10 from conda and a version compiled with icc:
%timeit np.log2(x) 1000 loops, best of 3: 1.24 ms per loop %timeit np.log(x) 1000 loops, best of 3: 1.28 ms per loop
I thought it might have something to with grabbing the MKL VML package, but looks like thats a no:
%timeit ne.evaluate('log(x)') 1000 loops, best of 3: 218 µs per loop
Looks like your numpy install is grabbing its log/log2 implementation from two different places which is odd.
numpy.log, The natural logarithm log is the inverse of the exponential function, so that The natural logarithm is logarithm in base e . log10 , log2 , log1p , emath.log. Fast computing of log2 for 64-bit integers. Why are log2 and log1p so much faster than log and log10, in numpy? Duplicate bits in a big numpy array. 3.
Disclaimer: I'm neither a credible nor an official source.
I'm almost certain that any implementation of log to the base e function can be made as fast as the log2 function, because to convert one to the other you require a single division by a constant. This is of course assuming that a single division operation is a tiny fraction of the other computations; which in accurate implementations of logarithms is true.
In most cases, numpy uses
glibc, you'll see the same difference in C/C++ if you use
cmath.h. In the comments, some people observe same speeds for
np.log2; I suspect this may come from different builds / platforms.
You can find the nicely commented source code for the two functions in files
e_log2f.c in the
flt-32/ subdirectories of this GitHub repo.
For double precision, in
log function is implementing a completely different algo (compared to
log2) from IBM from ~2001, which was included in their
libultim library. Whereas
log2 is from Sun Microsystems from ~1993. Just looking at the code one can see two different approximations are being implemented. In contrast, for single precision, both functions
log2 are the same apart from division by
ln2 in the
log2 case, hence the same speed.
For even more background on the underlying algorithms, alternatives and discussions on which to include in
glibc in the future, see here.
Numpy log base 10, The base can be any positive value other than 0 or 1 and the number can be any If `x` was a scalar, so is `out`, NumPy array can be multiplied by each other using log base 2 The Python log10 function is one of the Python Math functions that NumPy Ndarray. exp(a Computation on NumPy arrays can be very fast, or it Why are log2 and log1p so much faster than log and log10? (3) Whilst playing around with this question I noticed something I couldn't explain regarding the relative performance of np.log2, np.log and np.log10:
(Should probably be a comment but will be too long...)
To make this more interesting, in 2018 on a Windows 10 64 bit machine, the results are reversed.
Python 3.6.3 |Anaconda, Inc.| (default, Oct 15 2017, 03:27:45) [MSC v.1900 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. In : import numpy as np; np.random.seed(0); x = np.random.rand(100000) ...: %timeit np.log2(x) ...: %timeit np.log1p(x) ...: %timeit np.log(x) ...: %timeit np.log10(x) ...: 1.48 ms ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.33 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 840 µs ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 894 µs ± 2.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Python 3.6.3 |Intel Corporation| (default, Oct 17 2017, 23:26:12) [MSC v.1900 64 bit (AMD64)] Type 'copyright', 'credits' or 'license' for more information IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help. In : import numpy as np; np.random.seed(0); x = np.random.rand(100000) ...: %timeit np.log2(x) ...: %timeit np.log1p(x) ...: %timeit np.log(x) ...: %timeit np.log10(x) ...: 1.01 ms ± 2.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 236 µs ± 6.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 161 µs ± 1.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 171 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Python for Data Analysis, Unary ufuncs Function abs, fabs sqrt square exp log, log10, log2, log1p sign ceil floor Use fabs as a faster alternative for non-complex-valued data Compute the i.e. the smallest integer greater than or equal to each element Compute the as separate array Return boolean array indicating whether each value is NaN For complex-valued input, log2 is a complex analytical function that has a branch cut [-inf, 0] and is continuous from above on it. log2 handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard. Examples >>>
Numpy log10 example, NaNs are returned where x is negative. log1p (x, /, out=None, *, where=True, The base 10 log of 0. log10(), rather than as a method of a Math object you created Solution: NumPy is a Python package providing fast, flexible, and expressive 6 Feb 2018 If we plot the ln (log e), log10 and log2, it becomes more clear: for log, log2, log10, log1p, logb with a minimum of 10-bits of accuracy i.e. an ULP value less than or equal to whom the Materials are furnished to do so, subject
Numpy log1p base 10, For real-valued input data types, log1p always exp, expm1, log, log10, log2 python Why are log2 and log1p so much faster than log and log10? in 2018 on a The following are code examples for showing how to use numpy.log10().They are from open source Python projects. You can vote up the examples you like or vote down the ones you don't like.
Numpy log1p base 10, Issues 1,721. exp, expm1, log, log10, log2 and log1p are S4 generic and are members NumPy arrays also use much less memory than built-in Python sequences. The natural logarithm log is the inverse of the exponential function, so that it fast is to use vectorized operations, generally implemented through NumPy's expm1: math.expm1 is 1.8 times faster than numpy.expm1. atanh: numpy.arctanh is 1.6 times faster than math.atanh. log1p: math.log1p is 1.3 times faster than numpy.log1p. In all cases, the difference between the assembler code for each is just the use of the platform's function vs. the npymath.* version.
- this answer in math SE seems to say that some methods reduce by
log2for calculating any log. this may mean that the implementation of the log functions of np depend, in one way or another, on log2 and/or ln(x+1). I think this has to do with the taylor series of both of them as well
- This is a very interesting observation. I am in no means an expert in low level implementation of efficient computing routines. Intuitively I would guess that this has to do with the fact that all logarithms are conceptually related. If you know one, you basically know them all by simple transformations. So at some point you have to decide which one can be efficiently calculated on a processor. Calculating others via transformation then would obviously take a little more time. But I would love to see an expert answer here.
- Perhaps since binary data is base 2, there are some optimisation tricks available with log2
- that probably has to do with the relative simplicity of the taylor series of
- @FermionPortal Would you be interested in writing your comments up as an answer? I could have a go myself, but it seems a shame to let the bounty go to waste ;-)
- Careful, here we are talking about logarithms on floating point numbers, the "bit hacks" above do not apply.
- Oh shoot! Argh! Just when I thought my contribution could be useful! O_O... I'll add a note that says it applies to integers; maybe it can still be useful to some? Thank you for pointing it out Matteo - I learned something anyway! :)
- That's interesting - I'm still seeing very different timings for
np.log2using numpy 1.10.1 or 1.9.3 from conda, although both of these seem to have been compiled using gcc 4.4.7 rather than icc. I don't have access to a version compiled with icc in order to test this further.
- I got hold of a copy of ICC 16.0.1 and built numpy 1.10.1 from source following the instructions here. I'm now seeing slightly worse overall performance, with
np.log10still being about a factor of 2 slower than
- @ali_m Even more curious. Are you by chance running on a AMD cpu?
- Nope - so far I've tried this on two Intel machines. What is your setup?
- @ali_m All intel machine also. Have you tried
- Thanks for digging into this. I'm going to award you the bounty, since I think those links have so far provided the most useful insight into the issue. However, I still hesitate to mark this question as closed in case someone else wants to step in with a more comprehensive answer.