How can I find a subsequence in a &[u8] slice?

find all subsequences of a string python
find the number of occurrences of a subsequence in a string
number of subsequences in a string
subsequence geeksforgeeks
print all the subsequences of a string in lexicographical order.
print all subsequences of length k
subsequence vs substring
number of matching subsequences

I have a &[u8] slice over a binary buffer. I need to parse it, but a lot of the methods that I would like to use (such as str::find) don't seem to be available on slices.

I've seen that I can covert both by buffer slice and my pattern to str by using from_utf8_unchecked() but that seems a little dangerous (and also really hacky).

How can I find a subsequence in this slice? I actually need the index of the pattern, not just a slice view of the parts, so I don't think split will work.

Here's a simple implementation based on the windows iterator.

fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
    haystack.windows(needle.len()).position(|window| window == needle)
}

fn main() {
    assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
    assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}

The find_subsequence function can also be made generic:

fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
    where for<'a> &'a [T]: PartialEq
{
    haystack.windows(needle.len()).position(|window| window == needle)
}

Print all subsequences of a string, A String is a subsequence of a given String, that is generated by deleting some character of a given string without changing its order. Examples: Input : abc Output :  Print all subsequences of a string. Given a string, we have to find out all subsequences of it. A String is a subsequence of a given String, that is generated by deleting some character of a given string without changing its order.

I don't think the standard library contains a function for this. Some libcs have memmem, but at the moment the libc crate does not wrap this. You can use the twoway crate however. rust-bio implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)

How to find subsequences?, Given the requirement that you can only loop through the sequence once, you know the basic structure of the code. Given the parameters for  Given two strings, find if first string is a subsequence of second Given two strings str1 and str2, find if str1 is a subsequence of str2. A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements (source: wiki). Expected time complexity is linear.

How about Regex on bytes? That looks very powerful. See this rust playground demo.

use regex::bytes::Regex;

// This shows how to find all null-terminated strings in a slice of bytes
let re = Regex::new(r"(?-u)(?P<cstr>[^\x00]+)\x00").unwrap();
let text = b"foo\x00bar\x00baz\x00";

// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<&[u8]> =
    re.captures_iter(text)
      .map(|c| c.name("cstr").unwrap().as_bytes())
      .collect();
assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);

Subsequence, In mathematics, a subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the  A subsequence is a sequence that can be derived from another sequence by zero or more elements, without changing the order of the remaining elements. For the same example, there are 15 sub-sequences. They are (1), (2), (3), (4), (1,2), (1,3),(1,4), (2,3), (2,4), (3,4), (1,2,3), (1,2,4), (1,3,4), (2,3,4), (1,2,3,4).

I found the memmem crate useful for this task:

use memmem::{Searcher, TwoWaySearcher};

let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
    search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
    Some(41)
);

Longest common subsequence problem, The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just​  1. Find a subsequence of size 3 such that arr [i] arr [k]. 2. Find a sorted subsequence of size 4 in linear time.

Searching subsequences, We define the directed acyclic subsequence graph of a text as the smallest deterministic park! finite automaton that recognizes al: possible subsequences. How can I find a subsequence in this slice? I actually need the index of the pattern, not just a slice view of the parts, so I don't think split will work. Here's a simple implementation based on the windows iterator.

DAA - Longest Common Subsequence, DAA - Longest Common Subsequence - The longest common subsequence problem is finding the longest sequence which exists in both the given strings. Now we will find the maximum possible sum for such sequences. Time Complexity would be O((n C k)*n). Efficient Approach: We will be using a two-dimensional dp array in which dp[i][l] means that maximum sum subsequence of length l taking array values from 0 to i and the subsequence is ending at index ‘i’. Range of ‘l’ is from 0 to k-1.

[PDF] Longest Common Subsequence, There are 2m subsequences of X. Testing a sequences whether or not it is a subsequence of Y takes O(n) time. Thus, the naïve algorithm would take  In mathematics, a subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements. . For example, the sequence ,, is a subsequence of ,,,,, obtained after removal of elements , , and

Comments
  • There is interest is expanding the concept of Pattern to arbitrary slices: comment, RFC.
  • @FrancisGagné Sorry, I meant I needed the index of the subarray, not just the a slice from it. Concretely, I'm looking for boundaries in a network packet to see if I have a full message.
  • Very nice. I think I basically did it by hand with two nested for loops. The subarrays I'm looking for are all very small, so doing something more complex like KMP would be useless for my issues.
  • While this is a short and nice solution, please note that the algorithm runs in O(|haystack| * |needle|). This won't matter in most cases, but for more advanced and (asymptotically) faster algorithms, see String searching algorithm (Wikipedia).
  • This winds up being unacceptably slow. windows().position() is 100x slower than two nested loops.
  • @JasonN 100x sounds extreme. Are you sure you're compiling with optimizations?