Best way to determine if a sequence is in another sequence?
This is a generalization of the "string contains substring" problem to (more) arbitrary types.
Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts:
Example usage (Sequence in Sequence):
>>> seq_in_seq([5,6], [4,'a',3,5,6]) 3 >>> seq_in_seq([5,7], [4,'a',3,5,6]) -1 # or None, or whatever
So far, I just rely on brute force and it seems slow, ugly, and clumsy.
I second the Knuth-Morris-Pratt algorithm. By the way, your problem (and the KMP solution) is exactly recipe 5.13 in Python Cookbook 2nd edition. You can find the related code at http://code.activestate.com/recipes/117214/
It finds all the correct subsequences in a given sequence, and should be used as an iterator:
>>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,6]): print s 3 >>> for s in KnuthMorrisPratt([4,'a',3,5,6], [5,7]): print s (nothing)
5. Data Structures — Python 3.3.7 documentation, You will see this notation frequently in the Python Library Reference.) You might have noticed that methods like insert, remove or sort that modify some operations applied to each member of another sequence or iterable,� Sequences - Finding a Rule. To find a missing number in a Sequence, first we must have a Rule. Sequence. A Sequence is a set of things (usually numbers) that are in order.. Each number in the sequence is called a term (or sometimes "element" or "member"), read Sequences and Series for a more in-depth discussion.
Here's a brute-force approach
O(n*m) (similar to @mcella's answer). It might be faster than the Knuth-Morris-Pratt algorithm implementation in pure Python
O(n+m) (see @Gregg Lind answer) for small input sequences.
#!/usr/bin/env python def index(subseq, seq): """Return an index of `subseq`uence in the `seq`uence. Or `-1` if `subseq` is not a subsequence of the `seq`. The time complexity of the algorithm is O(n*m), where n, m = len(seq), len(subseq) >>> index([1,2], range(5)) 1 >>> index(range(1, 6), range(5)) -1 >>> index(range(5), range(5)) 0 >>> index([1,2], [0, 1, 0, 1, 2]) 3 """ i, n, m = -1, len(seq), len(subseq) try: while True: i = seq.index(subseq, i + 1, n - m + 1) if subseq == seq[i:i + m]: return i except ValueError: return -1 if __name__ == '__main__': import doctest; doctest.testmod()
I wonder how large is the small in this case?
Sequences - Finding A Rule, Sequence. A Sequence is a set of things (usually numbers) that are in order. Sequence. Each number in the Did you see how we wrote that rule using "x" and "n" ? xn means "term number n", x25 = 252 = 625. How about another example:� The HTG sequences, draft sequences from various genome projects or large genomic clones, are another large source of unannotated coding regions. BLASTp (Protein BLAST): compares one or more protein query sequences to a subject protein sequence or a database of protein sequences.
Same thing as string matching sir...Knuth-Morris-Pratt string matching
How Do You Determine if a Sequence is Arithmetic or Geometric , If it's got a common ratio, you can bet it's geometric. Practice identifying both of these sequences by watching this tutorial! Keywords: sequence; arithmetic� Sequence similarity searching, typically with BLAST (units 3.3, 3.4), is the most widely used, and most reliable, strategy for characterizing newly determined sequences. Sequence similarity searches can identify ”homologous” proteins or genes by detecting excess similarity – statistically significant similarity that reflects common ancestry.
A simple approach: Convert to strings and rely on string matching.
Example using lists of strings:
>>> f = ["foo", "bar", "baz"] >>> g = ["foo", "bar"] >>> ff = str(f).strip("") >>> gg = str(g).strip("") >>> gg in ff True
Example using tuples of strings:
>>> x = ("foo", "bar", "baz") >>> y = ("bar", "baz") >>> xx = str(x).strip("()") >>> yy = str(y).strip("()") >>> yy in xx True
Example using lists of numbers:
>>> f = [1 , 2, 3, 4, 5, 6, 7] >>> g = [4, 5, 6] >>> ff = str(f).strip("") >>> gg = str(g).strip("") >>> gg in ff True
Sequences: The Method of Common Differences, Find the next number in the following sequence: 1, 4, 9, 16, 25,. That is, for the first term (the 1-st term), it looks like they squared 1; for the second term Because "the right answer" is nothing more than whatever answer the author had in� In order to find the missing terms in a number sequence, we must first find the pattern of the number sequence. Example: Find the missing term in the following sequence: 8, _____, 16, _____, 24, 28, 32 . Solution: To find the pattern, look closely at 24, 28 and 32. Each term in the number sequence is formed by adding 4 to the preceding number.
>>> def seq_in_seq(subseq, seq): ... while subseq in seq: ... index = seq.index(subseq) ... if subseq == seq[index:index + len(subseq)]: ... return index ... else: ... seq = seq[index + 1:] ... else: ... return -1 ... >>> seq_in_seq([5,6], [4,'a',3,5,6]) 3 >>> seq_in_seq([5,7], [4,'a',3,5,6]) -1
Sorry I'm not an algorithm expert, it's just the fastest thing my mind can think about at the moment, at least I think it looks nice (to me) and I had fun coding it. ;-)
Most probably it's the same thing your brute force approach is doing.
How to - tell if a number will be in a sequence, Using different methods to tell if a number is in a sequence.Duration: 8:42 Posted: Sep 26, 2014 Find the next number in the sequence using difference table. Please enter integer sequence (separated by spaces or commas). Sequence solver (by AlteredQualia) Find the next number in the sequence(using difference table).
Calculus II - Sequences, In this section we define just what we mean by sequence in a math class and Due to the nature of the mathematics on this site it is best views in landscape mode. term in the sequence will be followed by another term as noted above. If the starting point is not important or is implied in some way by the� Given an sequence (such as a list or tuple), what's the best way of determining whether another sequence is inside it? As a bonus, it should return the index of the element where the subsequence starts: Example usage (Sequence in Sequence): >>> seq_in_seq( [5,6], [4,'a',3,5,6]) 3 >>> seq_in_seq( [5,7], [4,'a',3,5,6]) -1 # or None, or whatever.
Calculus II - More on Sequences, We will determine if a sequence in an increasing sequence or a Due to the nature of the mathematics on this site it is best views in landscape� Search the Gene database with the gene name, symbol. If you know the gene symbol and species, enter them as follows: tpo [sym] AND human [orgn] Click on the desired gene. Click on Reference Sequences in the Table of Contents at the upper right of the gene record.
21-110: Finding a formula for a sequence of numbers, If we have a (partial) sequence of numbers, how can we guess a but there are methods that will work for certain types of sequences. One way to find the value of b is to know that it represents the y A diagonal is a line from one vertex of the n -gon to another, except that edges of the n -gon don't count.
- Note that the KMP implementation given on code.activestate was demostrably slower by 30-500 times for some (perhaps unrepresentative input). Benchmarking to see if dumb built-in methods outperform seems to be a good idea!
- KMP is known to be about twice as slow as the naive algorithm in practice. Hence, for most purposes it’s completely inappropriate, despite its good asymptotic worst-case runtime.
- I like it! For quick & dirty stuff, anyway. Generally:
def is_in(seq1, seq2): return str(list(seq1))[1:-1] in str(list(seq2))[1:-1]Not a good way to find the index of the match, I guess.
- It is nice an clean, but brute-forcy --> O(mn)
- Aho-Corasick would be great. I'm specifically looking for python, or pythonish solutions... so if there were an implementation, that would be great. I'll poke around.
teecalls don't seem to be good for anything since the other element in tee's output 2-tuple is ignored.
seq2are each copied to two new generators, one of which gets instantiated into a list, and the other of which gets ignored.
- This solution is not reliable in case elements of sequences have non-unique lenghs: it become not obvious how to translate index returned to index in initial sequences. Note also that backtick for
`d`syntax is deprecated as for Python 3 and discouraged.
- example of non reliability even with all same sizes : sub='ab', full='aa','bb'
- Merely finds out whether the set is a subset of the sequence. Not whether it's actually in that order in the sequence.
- that could be a first, fast test however : check that all elements are in the full list.