Determine if a list of words is in a sentence?

Is there a way (Pattern or Python or NLTK, etc) to detect of a sentence has a list of words in it.


The cat ran into the hat, box, and house. | The list would be hat, box, and house

This could be string processed but we may have more generic lists:


The cat likes to run outside, run inside, or jump up the stairs. |

List=run outside, run inside, or jump up the stairs.

This could be in the middle of a paragraph or the end of the sentence which further complicates things.

I've been working with Pattern for python for awhile and I'm not seeing a way to go about this and was curious if there is a way with pattern or nltk (natural language tool kit).

What about using from nltk.tokenize import sent_tokenize ?

sent_tokenize("Hello SF Python. This is NLTK.")
["Hello SF Python.", "This is NLTK."]

Then you can use that list of sentences in this way:

for sentence in my_list:
  # test if this sentence contains the words you want
  # using all() method 

More info here

Python, For each substring in list of sentences, it checks how many words are there in the current substring and stores it in a variable 'k'. If the length of 'k' matches with  map() method applies a function on list of sentences and check if all words are contained in the list or not by splitting the list of words. It returns a boolean value for each substring of the list of sentence and store it in ‘res’. Finally, repeat the same steps as in approach #1.

From what I got from your question, I think you want to search whether all the words in your list is present in a sentence or not.

In general to search for a list elements, in a sentence, you can use all function. It returns true, if all the arguments in it are true.

listOfWords = ['word1', 'word2', 'word3', 'two words']
sentence = "word1 as word2 a fword3 af two words"

if all(word in sentence for word in listOfWords):
    print "All words in sentence"
    print "Missing"


"All words in sentence"

I think this might serve your purpose. If not, then you can clarify.

all(word in sentence for word in listOfWords)

  • Is your sentence the whole string or do you want to match a sentence within a larger text and return that sentence only?
  • Must the words occur in the order given? I.e., are you looking for a subset or a subsequence?
  • Are strings that contain those words valid? Such as "that" having the word "hat" as a substring for example.
  • Sorry for the delay, but I am not looking for a list of known words in a sentence. I am interested if it is possible to have a sentence and be able to know if there is a list of words in it. For example if I had: "I love to walk, run, and bike". I'd like my code to look at this sentence and tell me a list exists and it is: walk, run, and bike. To further complicate things, I am not guarnated that I will have that exact format. I could have two words in a list, etc. My initial thought is to just look at the second to last word, if its and, or then start reading backwards by ","'s as delim
  • This is the closest solution I could come up with. I simply use pattern (or you can use NLTK) to split the sentences up. But before hand I check to see if there is a list by finding out if the next word after the last ',' is and or but. If it is I back up and read to the previous ','. I set a 3 word max for my lists and this gets me as close as I can to a solution. The problem is that if I have 'the cat jumped over the hat, box, and cup" The first part of the list might be 'over the hat' which is acceptable for what I'm doing.
