How to vectorize a function which contains an if statement?

Related searches

Let's say we have the following function:

def f(x, y):
    if y == 0:
        return 0
    return x/y

This works fine with scalar values. Unfortunately when I try to use numpy arrays for x and y the comparison y == 0 is treated as an array operation which results in an error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-9884e2c3d1cd> in <module>()
----> 1 f(np.arange(1,10), np.arange(10,20))

<ipython-input-10-fbd24f17ea07> in f(x, y)
      1 def f(x, y):
----> 2     if y == 0:
      3         return 0
      4     return x/y

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I tried to use np.vectorize but it doesn't make a difference, the code still fails with the same error. np.vectorize is one option which gives the result I expect.

The only solution that I can think of is to use np.where on the y array with something like:

def f(x, y):
    np.where(y == 0, 0, x/y)

which doesn't work for scalars.

Is there a better way to write a function which contains an if statement? It should work with both scalars and arrays.

One way is to convert x and y to numpy arrays inside your function:

def f(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

This will work when one of x or y is a scalar and the other is a numpy array. It will also work if they are both arrays that can be broadcast. It won't work if they're arrays of incompatible shapes (e.g., 1D arrays of different lengths), but it's not clear what the desired behavior would be in that case anyway.

How Can I Vectorize Function With If Statement?, If someone can help me figure out how to vectorize functions with if function to be able to deal with vector or matrix, the function has to be designed that way. If� The point is, if you want your function to be able to deal with vector or matrix, the function has to be designed that way. If the function deals with scalar only, then MATLAB function arrayfun(),cellfun() and structfun() can help you go around.

I wonder what the problem is you're facing with np.vectorize. It works fine on my system:

In [145]: def f(x, y):
     ...:     if y == 0:
     ...:         return 0
     ...:     return x/y

In [146]: vf = np.vectorize(f)

In [147]: vf([[3],[10]], [0,1,2,0])
Out[147]: 
array([[ 0,  3,  1,  0],
       [ 0, 10,  5,  0]])

Note that the result dtype is determined by the result of the first element. You can also set the desired output yourself:

In [148]: vf = np.vectorize(f, otypes=[np.float])

In [149]: vf([[3],[10]], [0,1,2,0])
Out[149]: 
array([[  0. ,   3. ,   1.5,   0. ],
       [  0. ,  10. ,   5. ,   0. ]])

There are more examples in the docs.

Data science with Python: Turn your conditional loops to Numpy , What is less known is that it pays to even vectorize conditional loops for N_point = 1000# Define a custom function with some if-else loops Numpy provides a C-API for even faster code execution but it takes away the� Suppose you want to evaluate a function, F, of two variables, x and y. F(x,y) = x*exp(-x 2 - y 2) To evaluate this function at every combination of points in the x and y vectors, you need to define a grid of values. For this task you should avoid using loops to iterate through the point combinations.

You can use a masked array that will perform the division only where y!=0:

def f(x, y):
    x = np.atleast_1d(np.array(x))
    y = np.atleast_1d(np.ma.array(y, mask=(y==0)))
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

How to Use Vectorization with If Statements in R, In the preceding ifelse() function call, you translate the logical vector created by the expression my.hours > 100 into a vector containing the numbers 0.9 and 1 in � Vectorize loop with function containing randoms. Learn more about random, vectorization

A kind of clunky but effective way is to basically pre-process the data:

def f(x, y):
    if type(x) == int and type(y) == int: return x/y # Will it ever be used for this?

    # Change scalars to arrays
    if type(x) == int: x = np.full(y.shape, x, dtype=y.dtype)
    if type(y) == int: y = np.full(x.shape, y, dtype=x.dtype)

    # Change all divide by zero operations to 0/1
    div_zero_idx = (y==0)
    x[div_zero_idx] = 0
    y[div_zero_idx] = 1

    return x/y

I timed all the different approaches:

def f_mask(x, y):
    x = np.ma.array(x, mask=(y==0))
    y = np.array(y)
    ans = x/y
    ans[ans.mask]=0
    return np.asarray(ans)

def f_where(x, y):
    x = np.array(x)
    y = np.array(y)
    return np.where(y == 0, 0, x/y)

def f_vect(x, y):
    if y == 0:
        return 0
    return x/y

vf = np.vectorize(f_vect)

print timeit.timeit('f(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f; import numpy as np; array_length=1000")
print timeit.timeit('f_mask(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_mask; import numpy as np; array_length=1000")
print timeit.timeit('f_where(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import f_where; import numpy as np; array_length=1000")
print timeit.timeit('vf(np.random.randint(10, size=array_length), np.random.randint(10, size=array_length))', number=10000, setup="from __main__ import vf; import numpy as np; array_length=(1000)")

# f
# 0.760189056396

# f_mask
# 2.24414896965

# f_where
# RuntimeWarning: divide by zero encountered in divide return np.where(y == 0, 0, x/y)
# 1.08176398277

# f_vect
# 3.45374488831

The first function is the quickest, and has no warnings. The time ratios are similar if x or y are scalars. For higher dimensional arrays, the masked array approach gets relatively faster (it's still the slowest though).

numpy.vectorize — NumPy v1.19 Manual, Define a vectorized function which takes a nested sequence of a-b if a>b, otherwise return a+b" if a > b: return a - b else: return a + b. A universal function, or ufunc, is a function that performs elementwise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.

Consider that you have a predicted vector / np array : [0,1,0,1,1,0] and you want to convert it to the sequence ['N', 'Y', 'N', 'Y', 'Y', 'N']

import numpy as np

y_pred = np.array([0,1,0,1,1,0])

def toYN(x):
    if x > 0:
        return "Y"
    else:
        return "N"

vf_YN = np.vectorize(toYN)
Loan_Status = vf_YN(y_pred)

Loan_Status will contain ['N', 'Y', 'N', 'Y', 'Y', 'N']

How to vectorize a function which contains an if statement? Let's say , How to vectorize a function which contains an if statement? Let's say we have the following function: def f(x, y): if y == 0: return 0 return x/y This works fine with� The only problematic construct for vectorization in original code is *out++ = which is conditional induction. Its presence at both sides of if statement makes loop vectorizable in theory, but in practice it requires sophisticated analysis from compiler to prove that dependency on 'out' value doesn't cross iteration boundary.

An if statement can deal only with a single value, but the expression hours > 100 returns two values, as shown by the following code: > c(25, 110) > 100 [1] FALSE TRUE Choose based on a logical vector in R. The solution you’re looking for is the ifelse() function, which is a vectorized way of choosing values from two vectors. This remarkable

Thank you very much Mark and Werner; these are all great ways to get around the vectorize limitations of the if statement. In the attached program and picture I am attempting to put this to use. The program is attempting to compare values in a "vector divided by a scalar" to another "vector divided by a scalar".

Given a vector in C++, check if it contains a specified element or not. Searching for an element in a vector is linear time operation unless the vector is sorted. The <algorithm> header offers many functions that we can use for searching: 1. std::count. The simplest solution is to count number of elements in the vector having specified value.

Comments
  • Are you saying you want to pass a numpy array for y but just a single number for x? Or vice versa, or both?
  • If you wrap your y and x in np.asarray, the where version will work. But note that x/y is evaluated everywhere, and so you may get a warning or exception (depending on your floating-point flags) if any of y==0.
  • @BrenBarn both x and y are arrays in the second case. Edited my answer to make that more explicit.
  • np.vectorize works fine here :)
  • @moarningsun Can you post an answer with the code?
  • Change np.array(x) to np.asarray(x) (likewise for y) and you've got it.
  • Add with np.errstate(divide='ignore'): before the return (and indent the return), to silence the warnings.
  • otypes=[np.float] was the piece that I was missing.
  • Assuming you meant the last line to return x/y, this sets all values where y==0 to 1.
  • @PokeyMcPokerson thank you for the comment... Iwrote it in a hurry later on and now I've fixed it
  • If you mask the x array instead of the y, i.e. x = np.ma.array(x, mask=(y==0)) and y = np.array(y), it runs at about double the speed. It also gets rid of the warnings.