how to do re.compile() with a list in python

python re.compile example
print re.search python
regular expression in python for beginners
python regex multiple patterns
python regex extract
re.sub python 3
re.count python
re.verbose python

I have a list of strings in which I want to filter for strings that contains keywords.

I want to do something like:

fruit = re.compile('apple', 'banana', 'peach', 'plum', 'pinepple', 'kiwi']

so I can then use re.search(fruit, list_of_strings) to get only the strings containing fruits, but I'm not sure how to use a list with re.compile. Any suggestions? (I'm not set on using re.compile, but I think regular expressions would be a good way to do this.)

You need to turn your fruit list into the string apple|banana|peach|plum|pineapple|kiwi so that it is a valid regex, the following should do this for you:

fruit_list = ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']
fruit = re.compile('|'.join(fruit_list))

edit: As ridgerunner pointed out in comments, you will probably want to add word boundaries to the regex, otherwise the regex will match on words like plump since they have a fruit as a substring.

fruit = re.compile(r'\b(?:%s)\b' % '|'.join(fruit_list))

Regular Expression HOWTO, The resulting string that must be passed to re.compile() must be \\section . However findall() has to create the entire list before it can be returned as the result. The re.compile() method. re.compile(pattern, repl, string): We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it. Example

As you want exact matches, no real need for regex imo...

fruits = ['apple', 'cherry']
sentences = ['green apple', 'yellow car', 'red cherry']
for s in sentences:
    if any(f in s for f in fruits):
        print s, 'contains a fruit!'
# green apple contains a fruit!
# red cherry contains a fruit!

EDIT: If you need access to the strings that matched:

from itertools import compress

fruits = ['apple', 'banana', 'cherry']
s = 'green apple and red cherry'

list(compress(fruits, (f in s for f in fruits)))
# ['apple', 'cherry']

7.2. re — Regular expression operations, The functions are shortcuts that don't require you to compile a regex object first You can concatenate ordinary characters, so last matches the string 'last' . the regular expression, instead of passing a flag argument to the re.compile() function. If the ordinary character is not on the list, then the resulting RE will match the  Stop what you're doing right now. Do not attempt to recompile Python or pip. RHEL is heavily dependent on Python and if you make a mistake you will wind up with a broken system. Just install the already existing packages. – Michael Hampton Apr 24 '15 at 16:36

You can create one regular expression, which will match, when any of the terms is found:

>>> s, t = "A kiwi, please.", "Strawberry anyone?"
>>> import re
>>> pattern = re.compile('apple|banana|peach|plum|pineapple|kiwi', re.IGNORECASE)
>>> pattern.search(s)
<_sre.SRE_Match object at 0x10046d4a8>
>>> pattern.search(t) # won't find anything

7.2. re — Regular expression operations, You can concatenate ordinary characters, so last matches the string 'last'. If the ordinary character is not on the list, then the resulting RE will match the second but using re.compile() and saving the resulting regular expression object for  re.compile. With the re.compile() function we can compile pattern into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. Let’s see two examples, using the re.compile() function. The first example checks if the input from the user contains only letters, spaces or .

Code:

fruits =  ['apple', 'banana', 'peach', 'plum', 'pinepple', 'kiwi'] 
fruit_re = [re.compile(fruit) for fruit in fruits]
fruit_test = lambda x: any([pattern.search(x) for pattern in fruit_re])

Example usage:

fruits_veggies = ['this is an apple', 'this is a tomato']
return [fruit_test(str) for str in fruits_veggies]

Edit: I realized Andrew's solution is better. You could improve fruit_test with Andrew's regular expression as

fruit_test = lambda x: andrew_re.search(x) is None

re — Regular expression operations, The functions are shortcuts that don't require you to compile a regex object first, You can concatenate ordinary characters, so last matches the string 'last' . of the regular expression, instead of passing a flag argument to the re.compile() function. The special sequences consist of '\' and a character from the list below​. Python List copy () The copy () method returns a shallow copy of the list. A list can be copied using the = operator.

Pyhton 3.x Update:

fruit_list = ['apple', 'banana', 'peach', 'plum', 'pineapple', 'kiwi']
fruit = re.compile(r'\b(?:{0})\b'.format('|'.join(fruit_list))

6.2. re — Regular expression operations, You can concatenate ordinary characters, so last matches the string 'last'. If the ordinary character is not on the list, then the resulting RE will match the second but using re.compile() and saving the resulting regular expression object for  In my case, using re.compile is more explicit of the purpose of regular expression, when it's value is hidden to naked eyes, thus I could get more help from Python run-time checking. So the moral of my lesson is that when the regular expression is not just literal string, then I should use re.compile to let Python to help me to assert my

re.compile Python Example, The following are code examples for showing how to use re.compile(). You can vote up the examples you like or vote down the ones you don't like. def __init__​(self, inputFile: str, fakes: str, includes: list = None, includeFiles: list = None,  Python list method reverse () reverses objects of list in place.

Compiling and Flagging Regular Expressions, All of your favorite Python re search functions are methods of the pattern object. Using re.compile() to Make a Pattern Object on the match() and search() methods and an empty list/iterator for the findall() and finditer() methods, respectively. List Python List reverse () Python List reverse () The reverse () method reverses the elements of the list.

Python regular expressions, In Python, the re module provides regular expression matching operations. After we have compiled a pattern, we can use one of the functions to apply findall, Finds all substrings where the RE matches, and returns them as a list. re.finditer(pattern, text) for item in found: s = item.start() e = item.end()  sorted (list, key=len) Here, len is the Python's in-built function to count the length of an element. The list is sorted based on the length of each element, from lowest count to highest. We know that a tuple is sorted using its first parameter by default.

Comments
  • +1 But I would add word boundaries like so: fruit = re.compile('\\b(?:'+ '|'.join(fruit_list +')\\b'))
  • @ridgerunner - Good point! In fact the way it is written now 'pineapple' in the string would always match 'apple', adding word boundaries to my answer.
  • @user808545 - No problem, click on the outline of the check mark next to my answer to mark it as the accepted solution.
  • Efficient, +1. Don't be alarmed if I upvote a few of your answers, taking a break from answering this month and using the time to read some old stuff.
  • Depending on what your list of strings is you may need tp escape them: fruit = re.compile(r'\b(?:%s)\b' % '|'.join([re.escape(x) for x in fruit_list]))
  • In this scenario, regex is more efficient than doing several separate substring tests.
  • @Andrew: depends on the number of fruits and sentences, and even so we are talking 2x in a matter of milliseconds.
  • @hop - I am pretty confident regex will be faster regardless of number of fruits or sentences. With regex you also have access to the fruit that was matched.
  • @Andrew: Re efficiency: noted. Re access to matches: that's easy, check my update.
  • @Andrew: i will not dispute that regex are faster, but the non-regex solution might be sufficent on small data sets and easier to understand, especially if you have troubles with regex anyway.
  • Or if you need the strings: return [str for str in fruits_veggies if fruit_test(str)]