Python regular expressions 0 or more words from set

python regular expression example
python regex tester
python regex cheat sheet
regular expression in python for beginners
re.sub python 3
python regex match word in string
python regex multiple patterns
python regex extract

I have a big block of text within which I am trying to look for a phrase. The phrase can be structured in a number of different ways.

  1. First I want to look for a word from a set of words, let's call it set 1.
  2. After that there must be a space or comma (or maybe something else that separates words)
  3. Then there may be 0 or more words from set 2
  4. Again followed by the word separation characters as in point 2 above
  5. finally there should be a word from set 3

Ideally all of these should be in the same sentence.

set 1 = (Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)

set 2 = (for|to|of|full|a|be|complete|Internal)

set 3 = (renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

So I have this regex expression

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]*(for|to|of|full|a|be|complete|Internal)[ ,]*(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

Now this will match a phrase where there is 0 or 1 words from set 2 but not if there are multiples. e.g "provides a wonderful opportunity for someone to add their own stamp as the property needs complete renovation throughout."

as soon as I add in 'a' before 'complete' it fails. The same as if I add another 'complete'.

How do I specify to look for 0 or multiple words from a set?

Set 1: Matches any of the words in set 1 with 1 separator.

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]

Set 2: Matches any of the words in set 2 with 1 separator, 0 or more times.

((for|to|of|full|a|be|complete|Internal)[ ,])*

Set 3: Matches any of the words in set 3

(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

Full:

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]((for|to|of|full|a|be|complete|Internal)[ ,])*(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

7.2. re — Regular expression operations, A regular expression (or RE) specifies a set of strings that matches it; the functions in this Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as In other words, the '|' operator is never greedy. The First parameter, pattern denotes the regular expression, string is the given string in which pattern will be searched for and in which splitting occurs, maxsplit if not provided is considered to be zero ‘0’, and if any nonzero value is provided, then at most that many splits occurs.

Long alternatives in regular expressions can be quite slow. I'd suggest to take another approach. First segment the text (split to words) and the iterate over the array of words checking if subsequent sets of 3 words fulfil your requirements

Something like that (rather pseudocode than a real python):

def check(text):
  words = segment(text)
  for i in range(0, len(text)-2):
      check_word1(text[i]) and check_word1(text[i+1]) and check_word3(text[i+2])

Regular Expression HOWTO, They're used for specifying a character class, which is a set of characters that you This matches the letter 'a' , zero or more letters from the class [bcd] , and to match the word From only at the beginning of a line, the RE to use is ^From . >>> Regular expression or Regex is a sequence of characters that is used to check if a string contains the specified search pattern. RegEx Module. To use RegEx module, python comes with built-in package called re, which we need to work with Regular expression. To use RegEx module, just import re module. import re Example

You have to use this regex:

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,](for|to|of|full|a|be|complete|Internal)*[ ,](renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

Because you have one word from first set. After that you have one space or comma. Near you have 0 or more word from set 2. Then an other space or comma and finally one word from the last set.

Python regular expressions 0 or more words from set, Set 1: Matches any of the words in set 1 with 1 separator. (Potential|Ability|​Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]. A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression This blog post gives an overview and examples of regular expression syntax as implemented by the re built-in module (Python 3.7+).

Just in case you didn't know, you can use sites like https://regex101.com/ to test your regular expressions, and see why it works/it doesn't.

In this case, you need the "zero or more" (*) operator on your second group. The result would be:

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]*(for|to|of|full|a|be|complete|Internal)*[ ,]*(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

However, considering you probably want the words to be separated, just include the space on the operator (you can use a non-capturing group for that), resulting on:

(Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]*(?:(for|to|of|full|a|be|complete|Internal)[ ,]*)*(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation)

Python Regular Expressions | Python Education, Python regular expressions tutorial shows how to use regular object if zero or more characters at the beginning of string match the regular expression pattern. match_fun.py. #!/usr/bin/env python import re words = ('book', 'bookworm', 'Bible', The \d+ pattern looks for any number of digit sets in the text. The Python "re" module provides regular expression support. In Python a regular expression search is typically written as: match = re.search(pat, str) The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None otherwise.

Python regular expressions, To do this we'll cover the different operations in Python's re module, and how to use it expressions (or "regex" for short) you usually specify the rules for the set of 'a', or '0', are the simplest regular expressions; they simply match themselves​. Notice that only the matched string is returned, as opposed to the entire word​  Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character) "\w" Try it » \W: Returns a match where the string DOES NOT contain any word characters "\W" Try it » \Z: Returns a match if the specified characters are at the end of the string "Spain\Z" Try it »

Introduction to Regular Expressions in Python, In other words, the specified <regex> pattern 123 is present in s . A match [0-9] matches any single decimal digit character—any character between '0' and '9' , inclusive. Specifies a specific set of characters to match. Python offers two different primitive operations based on regular expressions: match checks for a match only at the beginning of the string, while search checks for a match anywhere in the string (this is what Perl does by default).

Regular Expressions: Regexes in Python (Part 1) – Real Python, Python 3 - Regular Expressions - A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pat. matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. 4. \W. matches any re.match(pattern, string, flags = 0). In this tutorial, you will learn about regular expressions (RegEx), and use Python's re module to work with RegEx (with the help of examples). A Re gular Ex pression (RegEx) is a sequence of characters that defines a search pattern.

Comments
  • {1} is useless.
  • I ran this in an online regex tester which came out find, but I've just tried to run it in a python script with the following 'import re text = "potential to modernise" regex = re.match("((Potential|Ability|Possibility|need|requires|needs|plenty|for|Needing|Requiring)[ ,]((for|to|of|full|a|be|complete|Internal)[ ,])*(renovate|improve|modernise|modernize|update|renovating|improving|modernising|modernizing|updating|potential|project|renovation))", text) match = regex.groups() print(match) ' and I get the error ......
  • ..... 'Traceback (most recent call last): File "/Users/Charlie/Documents/python/regex_potential.py", line 5, in <module> match = regex.groups() AttributeError: 'NoneType' object has no attribute 'groups' '
  • Case sensitivity... oops
  • for i in range(0, len(text)-2): then, since you're doing i+2
  • This will match Potential,,,,fortooffullabecompleteInternal, ,,, , fortooffullabecompleteInternalfortooffullabecompleteInternalfortooffullabecompleteInternalrenovate
  • I know, but since that's what he has on his regex I decided to modify it as little as possible (just in case he wants that behaviour for some reason)