Python/NumPy first occurrence of subarray

numpy where
numpy subarray
numpy find first index of value
python is subarray
numpy match
numpy find index of value
filter numpy array by condition
find all subarrays of an array python

In Python or NumPy, what is the best way to find out the first occurrence of a subarray?

For example, I have

a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]

What is the fastest way (run-time-wise) to find out where b occurs in a? I understand for strings this is extremely easy, but what about for a list or numpy ndarray?

Thanks a lot!

[EDITED] I prefer the numpy solution, since from my experience numpy vectorization is much faster than Python list comprehension. Meanwhile, the big array is huge, so I don't want to convert it into a string; that will be (too) long.

I'm assuming you're looking for a numpy-specific solution, rather than a simple list comprehension or for loop. One approach might be to use the rolling window technique to search for windows of the appropriate size. Here's the rolling_window function:

>>> def rolling_window(a, size):
...     shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)
...     strides = a.strides + (a. strides[-1],)
...     return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
... 

Then you could do something like

>>> a = numpy.arange(10)
>>> numpy.random.shuffle(a)
>>> a
array([7, 3, 6, 8, 4, 0, 9, 2, 1, 5])
>>> rolling_window(a, 3) == [8, 4, 0]
array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

To make this really useful, you'd have to reduce it along axis 1 using all:

>>> numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
array([False, False, False,  True, False, False, False, False], dtype=bool)

Then you could use that however you'd use a boolean array. A simple way to get the index out:

>>> bool_indices = numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
>>> numpy.mgrid[0:len(bool_indices)][bool_indices]
array([3])

For lists you could adapt one of these rolling window iterators to use a similar approach.

For very large arrays and subarrays, you could save memory like this:

>>> windows = rolling_window(a, 3)
>>> sub = [8, 4, 0]
>>> hits = numpy.ones((len(a) - len(sub) + 1,), dtype=bool)
>>> for i, x in enumerate(sub):
...     hits &= numpy.in1d(windows[:,i], [x])
... 
>>> hits
array([False, False, False,  True, False, False, False, False], dtype=bool)
>>> hits.nonzero()
(array([3]),)

On the other hand, this will probably be slower. How much slower isn't clear without testing; see Jamie's answer for another memory-conserving option that has to check false positives. I imagine that the speed difference between these two solutions will depend heavily on the nature of the input.

Python/NumPy first occurrence of subarray, In Python or NumPy what is the best way to find out the first occurrence of a subarrayFor example I have a 1 2 3 4 5 6b 2 3 4What is t Numpy is the core library for scientific computing in Python 😃. It provides a high-performance multidimensional array object and tools for working with these arrays 😍. NumPy arrays are

The following code should work:

[x for x in xrange(len(a)) if a[x:x+len(b)] == b]

Returns the index at which the pattern starts.

How to find the index of the first occurrence of an element in a , How to find the index of the first occurrence of an element in a NumPy array in Python. Finding the first occurrence of an element in a NumPy array returns the� Select a sub array from Numpy Array by index range. We can also select a sub array from Numpy Array using [] operator i.e. ndArray[first:last] It will return a sub array from original array with elements from index first to last – 1.

A convolution based approach, that should be more memory efficient than the stride_tricks based approach:

def find_subsequence(seq, subseq):
    target = np.dot(subseq, subseq)
    candidates = np.where(np.correlate(seq,
                                       subseq, mode='valid') == target)[0]
    # some of the candidates entries may be false positives, double check
    check = candidates[:, np.newaxis] + np.arange(len(subseq))
    mask = np.all((np.take(seq, check) == subseq), axis=-1)
    return candidates[mask]

With really big arrays it may not be possible to use a stride_tricks approach, but this one still works:

haystack = np.random.randint(1000, size=(1e6))
needle = np.random.randint(1000, size=(100,))
# Hide 10 needles in the haystack
place = np.random.randint(1e6 - 100 + 1, size=10)
for idx in place:
    haystack[idx:idx+100] = needle

In [3]: find_subsequence(haystack, needle)
Out[3]: 
array([253824, 321497, 414169, 456777, 635055, 879149, 884282, 954848,
       961100, 973481], dtype=int64)

In [4]: np.all(np.sort(place) == find_subsequence(haystack, needle))
Out[4]: True

In [5]: %timeit find_subsequence(haystack, needle)
10 loops, best of 3: 79.2 ms per loop

numpy.where — NumPy v1.13 Manual, If both x and y are specified, the output array contains elements of x where condition is True, and elements from y elsewhere. If only condition is given, return the� To check whether the value exist in array, first method comes in our mind is to use loop and check each value. A subarray is a contiguous part of an array. You can use np. The element was removed, but the array still has 3 elements, we can see that arr.

you can call tostring() method to convert an array to string, and then you can use fast string search. this method maybe faster when you have many subarray to check.

import numpy as np

a = np.array([1,2,3,4,5,6])
b = np.array([2,3,4])
print a.tostring().index(b.tostring())//a.itemsize

numpy.unique — NumPy v1.20.dev0 Manual, If an integer, the subarrays indexed by the given axis will be flattened and treated The indices of the first occurrences of the unique values in the original array. Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, and to split, reshape, and join the arrays.

numpy.take — NumPy v1.19 Manual, A call such as np.take(arr, indices, axis=3) is equivalent to arr[:,:,:,indices,] . Explained without fancy indexing, this is equivalent to the� This Python Numpy tutorial for beginners talks about Numpy basic concepts, practical examples, and real-world Numpy use cases related to machine learning and data science What is NumPy? NumPy in python is a general-purpose array-processing package. It stands for Numerical Python. NumPy helps to create arrays (multidimensional arrays), with the help of bindings of C++. Therefore, it is quite

The Basics of NumPy Arrays, Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools of individual array elements; Slicing of arrays: Getting and setting smaller subarrays within a larger array print(x2[0, :]) # first row of x2. Python Array Exercises, Practice and Solution: Write a Python program to remove the first occurrence of a specified element from an array.

Introducing Basic and Advanced Indexing — Python Like You Mean It, Topic: Numpy array basic indexing, Difficulty: Medium, Category: Section. -3, -3 , 4, 6]]) # Access the column-1 of row-0 and row-2. -1] 8 # Access the subarray of `x` # contained within the first two rows # and the first three columns >>> x[:2,� Python File Handling Python Read Files Python Write/Create Files Python Delete Files Python NumPy method only removes the first occurrence of the specified value.

NumPy: Array Object - Exercises, Practice, Solution, Practice with solution of exercises on Python NumPy: Array Object Write a NumPy program to reverse an array (first element becomes last) Write a NumPy program to create a 5x5 matrix with row values ranging from 0 to 4� A simple solution is to consider all subarrays one by one and check the sum of every subarray. We can run two loops: the outer loop picks a starting point i and the inner loop tries all subarrays starting from i (See this for implementation). Time complexity of this method is O (n 2). We can also use hashing.

Comments
  • Could you just convert the list to a string to make the comparison? x=''.join(str(x) for x in a) Then use the find method with the resulting strings? Or do they have to remain lists?
  • The problem with this approach is that ,while the return of rolling_window doesn't require any new memory, and reuses that of the original array, when doing the == operation you instantiate a new boolean array that is size times the full size of your original array. If the array is big enough, this can kill performance big time.
  • That's true. In fact, my main intent in using the rolling windows function was not to save memory, but to quickly generate an array of the required structure. But I added my own memory-conserving solution; yours looks promising as well. I don't have the motivation to test them against each other!
  • This might not be the fastest solution, but +1 for the simplest answer. This might fit the needs of many users, especially if numpy is not available.
  • In Python 3 use range instead of xrange.
  • For improved performance, you could replace len(a) with len(a) - len(b) + 1
  • While I really like this approach, I should note that in general finding candidates by l2 norm is not better than finding a particular symbol from needle. But after a small modification by computing dot product with randomized pattern of the same length as needle, this method will be just awesome.
  • this solution is very quick and elegant, thanks a lot! Slightly related, I had a project grabbing np arrays of about 1e8 elements from C++ using a SWIG wrapper and the array creation was very slow. Working with them as strings boosted performance into real-time
  • Two problems: This would also match [1, 3, 2, 4, 5, 6] (sets are not ordered; arrays are), and it doesn't report the location of the match (which should be index 1).
  • Yeah my bad, answered too quickly :-/
  • You can simplify your code a bit by replacing first_occurence=i with return i, and return first_occurence with return None.
  • You would probably add in some logic to take care of cases where there are no matches...