How can I modify this regex pattern so that once it finds a match it returns the whole sentence, not just the words it matched?

python regex match word in string
regular expression in python for beginners
regex python
python regex cheat sheet
python regex return match
regex find all matches
python regex replace
python regex replace all

As the title explains this regex pattern basically checks the description variable for matching word combinations within set, eg:

set = ["oak", "wood"]

then if it finds those 2 words within a 5 word spacing it will return those words. However, I need it to return the matching sentence. So if for example the description was:

description = "...would be a lovely addition to any home. This lovely oak hard wood table comes in a variety of sizes. Another great reason to consider..." 

instead of just returning the matching words I want it to return the entire sentence that contains the keywords.

This is what I'm working with at the moment which obviously just returns the matching set pair.

re.findall(r"\b(?:(%s)\W+(?:\w+\W+){0,5}?(%s)|(%s)\W+(?:\w+\W+){0,5}?(%s))\b" % (set[0], set[1], set[1], set[0]), description)

I'm also aware that I believe this pattern will look beyond a single sentence for a match and as such you might get a case where it finds a match over 2 different sentences. If possible I'd also like to find a way that restricts matches to only be possible within the same sentence.

I'd appreciate any help I can get with this.

EDIT: Just to clarify my desired output is:

"This lovely oak hard wood table comes in a variety of sizes."

As this is the sentence which contains the matching keyword pair.

Thanks!


As per my comment some dummy code using nltk (do not have access to Python right now):

from nltk import sent_tokenize

for sent in sent_tokenize(your_data_here):
    if any(['foo', 'bar']) in sent:
        # do sth. useful here

Obviously, you could even apply your initial regex on sent (it's a string after all).

[PDF] Regular Expressions: The Complete Tutorial, As per my comment some dummy code using nltk (do not have access to Python right now): Obviously, you could even apply your initial regex on sent (it's a string after all). You can get it to capture the entire sentence by having it look for periods at the ends. w] (match anything that isn't a period or a word character). Regex not working in HTML5 pattern. regex,html5. The pattern attribute has to match the entire string. Assertions check for a match, but do not count towards the total match length. Changing the second assertion to \w+ will make the pattern match the entire string. You can also skip the implied ^, leaving you with just: <input pattern="(?!34


You can use the following RegEx:

print(re.findall(r"(^|(?<=\.))([a-zA-Z0-9\s]*oak[a-zA-Z0-9\s]*wood.*?(?=\.|$)).*?|([a-zA-Z0-9\s]*wood[a-zA-Z0-9\s]*oak.*?(?=\.|$))", description))

where:

r"(^|(?<=\.))" # means start with 'start of string' or '.'
r"([a-zA-Z0-9\s]*oak[a-zA-Z0-9\s]*wood.*?(?=\.)).*?" # means any letter/number/space followed bi 'oak', followed by any letter/number/space, followed by wood, stopping at the first occurrence of a '.' or 'end of line'
r"([a-zA-Z0-9\s]*wood[a-zA-Z0-9\s]*oak.*?(?=\.|$))" # same as previous, but with | (or) condition matches the wood-oak case

Output:

('', ' This lovely oak hard wood table comes in a variety of sizes', '')

Using Regex for Text Manipulation in Python, Matching Floating Point Numbers with a Regular Expression . A "match" is the piece of text, or sequence of bytes or characters that pattern was found to Many more recent regex engines are very similar, but not identical, to the one of Perl Find a word, even if it is misspelled, such as «sep[ae]r[ae]te» or «li[cs]en[cs]e». Match words that contain 'u' 17.6.5. Match words that contain the pattern 'ai' 17.6.6. Match words that contain the pattern 'ai' or 'ie' 17.6.7. Match words that contain 'k' or 'f' 17.6.8. Match words that contain any letters in the range 'b' through 'd' 17.6.9. Use Regex to validate your input: 17.6.10. Get matched parts: 17.6.11. Match index


Is it a must to use regex? I found it more strict forward to just use the below:

set = ["oak","wood"]
description = "...would be a lovely addition to any home. This lovely oak hard wood table comes in a variety of sizes. Another great reason to consider..."

description2 = "...would be a lovely addition to any home. This is NOT oak however we do make other varieties that use cherry for a different style of hard wood."

def test_result(desc):
    desc = desc.split(". ")
    for sent in desc:
        if all(s in sent for s in set):
            if -5 <= sent.split(" ").index("oak") - sent.split(" ").index("wood") <= 5:
                print (sent)

test_result(description)
test_result(description2)

Result:

This lovely oak hard wood table comes in a variety of sizes

(Tutorial) Python Regular Expression, A Regular Expression is a text string that describes a search pattern which can be In case if no match is found by the match function, a null object is returned. After the word The there is a space, which is not treated as an alphabet letter, therefore the matching stopped and the expression returned just The , which is the  Yes, you're absolutely right. You're just not using it right :). Just use the "sentence" group within a match, but use the whole regular expression for that. The first group ("port") is looking for digits and a space character, and the second group is looking for everything that is left until the end of line.


You may try with following regex:

[^.]*?\boak(?:\W+[^\W.]+){0,5}?\W+wood(?:\W+[^\W.]+){0,5}?\W+table(?:\W+[^\W.]+){0,5}?\W+variety[^.]*\.+

Demo with several examples

Explained:

[^.]*?                 # Anything but a dot, ungreedy
  \b oak               # First word (with word boundary)
(?:\W+[^\W.]+){0,5}?   # Some (0-5) random words: (separator + word except dot) x 5, ungreedy
 \W+ wood              # Second word. Starts with some separator
(?:\W+[^\W.]+){0,5}?   # Again, random words, ungreedy
 \W+ table             # third word. Starts with some separator
(?:\W+[^\W.]+){0,5}?   # Again, random words, ungreedy
 \W+ variety           # Final required word
[^.]*                  # The rest of the sentence (non dot characters) up to the end
\.+                    # We match the final dot (or ... if more exist)

4. Pattern Matching with Regular Expressions, Regular expressions are used to identify whether a pattern exists in a given This is when the group feature of regular expression comes in handy. The plain match.group() without any argument is still the whole matched text as usual. If the pattern is not found then the string is returned unchanged. \b encasing like on SO breaks the whole thing from what I can tell (or I don't know what I'm doing). As much as "give me the answer" would be nice, if possible, can someone point me in the right direction? An article or a keyword I don't know. I'm not familiar with regex beyond some real simple basics.


You can get it to capture the entire sentence by having it look for periods at the ends. You can also have it exclude periods from the search in the middle by replacing \W (match non-word characters) with [^.\w] (match anything that isn't a period or a word character).

"(^|\.)([^.]*\b(?:(%s)[^.\w]+(?:\w+[^.\w]+){0,5}?(%s)|(%s)[^.\w]+(?:\w+[^.\w]+){0,5}?(%s))\b[^.]*)(\.|$)"

The (^|\.) will match the beginning of the input or a period and the (\.|$) will match a period or the end of the input (in case there is input after the last period).

I can't test this in python right now, but it should point you in the right direction even if I have an error or typo.

Python Regular Expressions | Python Education, Pattern Matching with Regular Expressions Introduction Suppose you have been on open up all 15,000,000 documents in a word processor, I'll just find it with one simple In the top version, I typed only a q into the string, which is not matched. Group 0 is the entire match, so group(0) (or just group( ) ) returns the entire  m/regex/modifier: Match against the regex. s/regex/replacement/modifier: Substitute matched substring(s) by the replacement. Matching Operator m// You can use matching operator m// to check if a regex pattern exists in a string. The syntax is: m/regex/ m/regex/modifiers # Optional modifiers /regex/ # Operator m can be omitted if forward-slashes


String Manipulation and Regular Expressions, Regular expressions are a powerful language for matching text power of regular expressions is that they can specify patterns, not just fixed for this, it only matches a single word char, not a whole word. matches the start of string, so this fails: Here re.findall() returns a list of all the found email strings Java Regex. Write a class called myRegex which will contain a string pattern. You need to write a regular expression and assign it to. the pattern such that it can be used to validate an IP address.


Learning Java, Such string manipulation patterns come up often in the context of data science just the first letter of each word, or perhaps the first letter of each sentence. and index() is their behavior when the search string is not found; find() returns -1 of regular expressions in Flexible Pattern Matching with Regular Expressions. Sub replace if whole line match pattern. If you meant for it to be more specific just change the pattern between the first set of slashes. VIM regex - match


Regular Expressions :: Eloquent JavaScript, constructing a regex that matches the word boundary. We could try inserting that pattern into our regular expressions wherever we would Now we're actually matching those characters not just requiring them. only when it's part of the word “Patrick,” like so: (?=Patrick)Pat // Find Pat only in Patrick Another operator (? For example: C-c ` \w+ RET tells icicle-search (bond to C-c `) to use words as the search contexts, meaning search only within words. Then type the letters to match within each word, separating them by S-SPC. For example, to match letters h, t, and e in the same word, you can type h S-SPC t S-SPC e (or the same letters in