multiplying an integer numpy array with a float scalar without intermediary float array

numpy multiply array by scalar
numpy matrix multiplication
numpy dot
numpy multiply along axis
numpy sum
numpy array multiplication
python numpy multiply list by scalar
multiple np array by scalar

I'm dealing with very large image arrays of uint16 data that I would like to downscale and convert to uint8.

My initial way of doing this caused a MemoryError because of an intermediary float64 array:

img = numpy.ones((29632, 60810, 3), dtype=numpy.uint16) 

if img.dtype == numpy.uint16:
    multiplier = numpy.iinfo(numpy.uint8).max / numpy.iinfo(numpy.uint16).max
    img = (img * multiplier).astype(numpy.uint8, order="C")

I then tried to do the multiplication in place, the following way:

if img.dtype == numpy.uint16:
    multiplier = numpy.iinfo(numpy.uint8).max / numpy.iinfo(numpy.uint16).max
    img *= multiplier
    img = img.astype(numpy.uint8, order="C")

But I run into the following error:

TypeError: Cannot cast ufunc multiply output from dtype('float64') to dtype('uint16') with casting rule 'same_kind'

Do you know of a way to perform this operation while minimizing the memory footprint?

Where can I change the casting rule mentioned in the error message?

You could also use Numba or Cython in such cases

There you could explicitly avoid any temporary arrays. The code is a bit longer, but very easy to understand and faster.

Example

import numpy as np
import numba as nb

@nb.njit(parallel=True)
def conv_numba(img):
    multiplier = np.iinfo(np.uint8).max / np.iinfo(np.uint16).max
    img_out=np.empty(img.shape,dtype=np.uint8)
    for i in nb.prange(img.shape[0]):
        for j in range(img.shape[1]):
            for k in range(img.shape[2]):
                img_out[i,j,k]=img[i,j,k]*multiplier
    return img_out

#img_in have to be contigous, otherwise reshape will fail
@nb.njit(parallel=True)
def conv_numba_opt(img_in):
    multiplier = np.iinfo(np.uint8).max / np.iinfo(np.uint16).max
    shape=img_in.shape

    img=img_in.reshape(-1)
    img_out=np.empty(img.shape,dtype=np.uint8)

    for i in nb.prange(img.shape[0]):
        img_out[i]=img[i]*multiplier
    return img_out.reshape(shape)

def conv_numpy(img):
    np.multiply(img, multiplier, out=img, casting="unsafe")
    img = img.astype(np.uint8, order="C")
    return img

Timings

img = np.ones((29630, 6081, 3), dtype=np.uint16)

%timeit res_1=conv_numpy(img)
#990 ms ± 2.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit res_2=conv_numba(img)
#with parallel=True
#122 ms ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#with parallel=False
#571 ms ± 2.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

numpy.multiply — NumPy v1.19 Manual, If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. A tuple (possible only� When multiplying a numpy float and a python list, the float is cast as an int and then does the usual python thing (creates duplicates of the original sequence). For example, >>> np.float64(2.3) * [1, 2] [1, 2, 1, 2]

Q : "Do you know of a way to perform this operation while minimizing the memory footprint?"

First, let's get the [SPACE]-domain sizing right. The base-array is 29k6 x 60k8 x RGB x 2B in-memory object:

>>> 29632 * 60810 * 3 * 2 / 1E9         ~ 10.81 [GB]

having eaten some 11 [GB] of RAM.

Any operation will need some space. Having a TB-class [SPACE]-Domain for purely in-memory numpy-vectorised tricks, we are done here.

Given the O/P task was to minimise the memory footpint, moving all the arrays and their operations into numpy.memmap()-objects will solve it.

numpy - Multiply every element in an array, Construct an array of float s � Construct an array of int16 s � Find the maximum element of an array import numpy as np. a = np.array([1, 1, 1, 1, 1]) print a * 0.5 � Convert float to int array using np.int_ Here we have another way; import numpy as np codespeedy_float_list = [45.45,84.75,69.12] codespeedy_array = np.array(codespeedy_float_list) print(np.int_(codespeedy_array)) Output: $ python codespeedy.py [45 84 69] let us know if you know any other way to achieve our goal in the below comment section

Using integer division (instead of multiplying by the inverse) should prevent the intermediary floating point array (correct me if wrong) and allow you to do the operation in place.

divisor = numpy.iinfo(numpy.uint16).max // numpy.iinfo(numpy.uint8).max
img //= divisor
img = img.astype(numpy.uint8, order="C")

scikit-hep/awkward-array: Manipulate arrays of complex , Manipulate arrays of complex data structures as easily as Numpy. reads columnar data directly from ROOT files in Python without intermediary C++. bytes as signed and unsigned integers of various bit-widths, floating-point numbers, The output array has this structure, populated by return values of the scalar function. At the heart of a Numpy library is the array object or the ndarray object (n-dimensional array). You will use Numpy arrays to perform logical, statistical, and Fourier transforms. As part of working with Numpy, one of the first things you will do is create Numpy arrays. The main objective of this guide is to inform a data professional, you

I finally found a solution that works after some reading of the numpy ufunc documentation.

    multiplier = numpy.iinfo(numpy.uint8).max / numpy.iinfo(numpy.uint16).max
    numpy.multiply(img, multiplier, out=img, casting="unsafe")
    img = img.astype(numpy.uint8, order="C")

I should have found this earlier, but it's not an easy read if you are not familiar with some of the technical vocabulary.

4. NumPy Basics: Arrays and Vectorized Computation, The numerical dtypes are named the same way: a type name, like float or int In a two-dimensional array, the elements at each index are no longer scalars but rather method, and a function in the numpy namespace, for matrix multiplication: numpy.multiply () in Python numpy.multiply () function is used when we want to compute the multiplication of two array. It returns the product of arr1 and arr2, element-wise. Syntax : numpy.multiply (arr1, arr2, /, out=None, *, where=True, casting=’same_kind’, order=’K’, dtype=None, subok=True [, signature, extobj], ufunc ‘multiply’)

How about getting a 4.3 ~ 4.7 x faster solution ?

Polishing a bit the PiRK's numpy.ufunc-based inplace processing solution, sketched here,there is about a 4.3 ~ 4.7 x faster modification thereof:

>>> from zmq import Stopwatch; aClk = Stopwatch() # a trivial [us]-resolution clock
>>> ###
>>> ############################################### ORIGINAL ufunc()-code:
>>> ###
>>> ###   np.ones( ( 29632, 608, 3 ), dtype = np.uint16 ) ## SIZE >> CPU CACHE SIZE
>>> I   = np.ones( ( 29632, 608, 3 ), dtype = np.uint16 )
>>> #mg = np.ones( ( 29632, 608, 3 ), dtype = np.uint16 ); aClk.start(); _ = np.multiply( img, fMUL, out = img, casting = 'unsafe' ); img = img.astype( np.uint8, order = 'C' );aClk.stop() ########## a one-liner for fast re-testing on CLI console
>>> img = I.copy();aClk.start();_= np.multiply( img,
...                                             fMUL,
...                                             out     =  img,
...                                             casting = 'unsafe'
...                                             ); img  =  img.astype( np.uint8,
...                                                                    order = 'C'
...                                                                    );aClk.stop()

312802 [us]
320087 [us]
329401 [us]
317346 [us]

Using some more of the documented ufunc-smart kwargs right in the first shot, the performance grows ~ 4.3 ~ 4.7 x

>>> ### = I.copy(); aClk.start(); _ = np.multiply( img, fMUL, out = img, casting = 'unsafe', dtype = np.uint8, order = 'C'  ); aClk.stop() ########## a one-liner for fast re-testing on CLI console
>>> img = I.copy(); aClk.start(); _ = np.multiply( img,
...                                                fMUL,
...                                                out     =  img,
...                                                casting = 'unsafe',
...                                                dtype   =  np.uint8,
...                                                order   = 'C'
...                                                ); aClk.stop()
69812 [us]
71335 [us]
73112 [us]
70171 [us]

Q : Where can I change the casting rule mentioned in the error message?

The implicit (default) mode, used for the casting argument, has changed IIRC somewhere between the numpy-versions 1.10 ~ 1.11, yet it is quite well documented in the published numpy.ufunc API documentation.

1.4.2. Numerical operations on arrays — Scipy lecture notes, With scalars: >>> Array multiplication is not matrix multiplication: >>> np.sin( a). array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ]) For advanced use: master the indexing with arrays of integers, as well as broadcasting. Method 2: Using numpy.prod() We can use numpy.prod() from import numpy to get the multiplication of all the numbers in the list. It returns an integer or a float value depending on the multiplication result. Below it the Python3 implementation of the above approach:

Numerical calculations with NumPy, This allows NumPy functions to be used without qualifying them with the Some of the arrays created above contain integers and some contain floating Notice that combining a scalar with an array, for instance by addition or multiplication,� su, 2010-06-20 kello 13:56 -0400, Tony S Yu kirjoitti: > I came across some strange behavior when multiplying numpy floats and > python lists: the list is returned unchanged: > > > In [18]: np.float64(1.2) * [1, 2] > > > > Out[18]: [1, 2] Probably a bug, it seems to round the result first to an integer, and then do the usual Python thing (try for example np.float64(2.2)).

Supported NumPy features — Numba 0.50.1 documentation, Nested structured scalars the fields of structured scalars may not contain < dtype>) method to bitcast all int and float types within the same width. Numpy arrays of any of the scalar types above are supported, regardless of the shape or layout. On Python 3.5 and above, the matrix multiplication operator from PEP 465 (i.e.� Have another way to solve this solution? Contribute your code (and comments) through Disqus. Previous: Write a NumPy program to access an array by column. Next: Write a NumPy program to display NumPy array elements of floating values with given precision.

numpy.divide — NumPy v1.14 Manual, If not provided or None, a freshly-allocated array is returned. In Python 2, when both x1 and x2 are of an integer type, divide will behave like floor_divide . numpy.dot¶ numpy.dot (a, b, out=None) ¶ Dot product of two arrays. For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:

Comments
  • Great solution, thanks! I just had to write a second numba function with only 2 levels of loops to handle also grayscale (2D) image arrays. I'm wondering if numba can also provide a solution for any number of dimensions, i.e. without the explicit loops?
  • @PiRK Yes, but there is a bit more information needed. eg. you can flatten the array (reshape(-1)) and than reshape it once again at the end. But there are only contigous arrays supported. If you do the reshaping outside of the numba function, flattening a strided array will lead to a copy, which you want to avoid. So for a full solution you will likely need two codepaths (one for a strided array, one for a contigous array)
  • Yes, that makes sense. I will have to keep the loops for now, until I find a low level library to read in the data in the right order. I'm reading in CZI files with a library that returns all the data with weird shapes (2, 3, 29632, 60810, 1) (axes "SCYX0"), and I have to perform some transposition to end up with my RGB images. Thanks anyway.
  • I will keep this answer in mind if I encounter even larger images that won't fit on my 16 Gb of RAM. :)
  • There is an improved solution, having about a 4.3 ~ 4.8 x faster in-place processing benchmarked and posted below
  • Thanks for the suggestion. But I get a similar error. TypeError: ufunc 'floor_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'H') according to the casting rule ''same_kind''
  • This does not work. The array type is not changed. Looks like the dtype argument is ignored when the out argument is provided. It makes sense that you can't change the output array's type in-place. A copy is unavoidable, for this step.
  • @PiRK Mea Culpa - I did test via .flags only the "F" to "C" ordering was POSACK'ed to get converted in-place, not the dtype.