Regular Expression Lookbehind doesn't work with quantifiers ('+' or '*')

regex negative lookbehind
regex lookbehind javascript
regex multiple lookahead
java regex lookahead
regex lookbehind multiple characters
grep negative lookahead
regex negative match
ruby regex negative lookahead

I am trying to use lookbehinds in a regular expression and it doesn't seem to work as I expected. So, this is not my real usage, but to simplify I will put an example. Imagine I want to match "example" on a string that says "this is an example". So, according to my understanding of lookbehinds this should work:

(?<=this\sis\san\s*?)example

What this should do is find "this is an", then space characters and finally match the word "example". Now, it doesn't work and I don't understand why, is it impossible to use '+' or '*' inside lookbehinds?

I also tried those two and they work correctly, but don't fulfill my needs:

(?<=this\sis\san\s)example
this\sis\san\s*?example

I am using this site to test my regular expressions: http://gskinner.com/RegExr/

Regular Expression Reference: Special Groups, Positive lookahead, (?=regex), Matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume  Many regular expression libraries do only allow strict expressions to be used in look behind assertions like: only match strings of the same fixed length: (?<=foo|bar|\s,\s) (three characters each) only match strings of fixed lengths: (?<=foobar|\r ) (each branch with fixed length)

Hey if your not using python variable look behind assertion you can trick the regex engine by escaping the match and starting over by using \K.

This site explains it well .. http://www.phpfreaks.com/blog/pcre-regex-spotlight-k ..

But pretty much when you have an expression that you match and you want to get everything behind it using \K will force it to start over again...

Example:

string = '<a this is a tag> with some information <div this is another tag > LOOK FOR ME </div>'

matching /(\<a).+?(\<div).+?(\>)\K.+?(?=\<div)/ will cause the regex to restart after you match the ending div tag so the regex won't include that in the result. The (?=\div) will make the engine get everything in front of ending div tag

Lookahead and Lookbehind Tutorial—Tips &Tricks, You can chain three more lookaheads after the first, and the regex engine still won't move. In fact, that's a useful technique. A quick syntax reminder. This page​  Lookbehind has the same effect, but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<! a) b matches a “b” that is not preceded by an “a”, using negative lookbehind. It doesn’t match cab, but matches the b (and only the b) in bed or debt.

What Amber said is true, but you can work around it with another approach: A non-capturing parentheses group

(?<=this\sis\san)(?:\s*)example

That make it a fixed length look behind, so it should work.

Lookahead and lookbehind, Does somebody have examples so I can try to understand how they work? (?!) - negative lookahead (?=) - positive  Actually lookaround is divided into lookbehind and lookahead assertions. Lookbehind means to check what is before your regex match while lookahead means checking what is after your match. And the presence or absence of an element before or after match item plays a role in declaring a match.

Most regex engines don't support variable-length expressions for lookbehind assertions.

Regex lookahead, lookbehind and atomic groups, Positive lookbehind reverses the order of positive lookahead. The lookbehind part of the pattern, which usually appears at the start of a regular  Regular expressions are a challenge by themselves. For me it always takes a few minutes until I understand what a particular regular expression does but there is no question about their usefulness. Today, I just had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.

You can use sub-expressions.

(this\sis\san\s*?)(example)

So to retrieve group 2, "example", $2 for regex, or \2 if you're using a format string (like for python's re.sub)

Regular Expression Lookahead and Lookbehind, Note that the subpattern in the assertion does not generate a match in the Positive lookahead with ?= peeks ahead to ensure that its subpattern could match. The regexp #rx"grey(?=hound)" matches grey, but only if it is followed by hound. VBScript (and JavaScript and JScript) regular expressions don't support lookbehind Regular expression to match a line that doesn't contain a word. 1369.

9.9 Looking Ahead and Behind, Regular expression tester with syntax highlighting, PHP / PCRE & JS Support, contextual help, cheat sheet, reference, and searchable community patterns. Regular Expression Language - Quick Reference. 03/30/2017; 10 minutes to read +14; In this article. A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs. For a brief introduction, see .NET Regular Expressions.

Regex with negative lookbehind, Regular expressions in JavaScript: lookaround assertions by example what does not come before or after a match (negative lookaround) #. The really confusing part here is that regular expressions in the strict sense can very much do what OP wants, but the common language to write them does not allow it, which leads to (mathematically ugly) workarounds like look-aheads. Please see this answer below and my comment there for (theoretically aligned) proper way of doing it.

Regular expressions in JavaScript: lookaround assertions by example, Fortunately, most regular expression flavors provide the lookbehind and If an optional named group does not participate in the match, the  Please note: the lookahead is merely a test, the contents of the parentheses (?=) is not included in the result 30. When we look for X (?=Y), the regular expression engine finds X and then checks if there’s Y immediately after it. If it’s not so, then the potential match is skipped, and the search continues.

Comments
  • This needs a tag that identifies the language or environment where you use them. .NET's regular expressions handle this without a problem.
  • Notice! If your regex would work like you want it will also match example from this: this is anexample. So if you don't want that you should remove the ?
  • micha: They should probably just change the * to a +. Removing the ? has no effect in that regard. But indeed, *? as a quantifier is useless and unnecessary in this case as there isn't any more whitespace to match after that, so \s*? is equivalent to \s*.
  • In my answer to this question, I have listed some strategies/workarounds after I ran into this limitation on negative lookbehinds. Hope it can help some others too!
  • this works with ruby 2.x but fails with 1.9 and jruby 1.7.x; original comment: good one, I'm surprised I never knew this feature. Learn to format code in the editor and you'll be priceless
  • It's the same like (?<=this\sis\san)\s*?example that means that it also match the spaces and for your information (?: ) makes the process slower.
  • micha, I'd worry more about the matching part in that case than about performance. I get on average 0.02451781 ms with the non-capuring group and 0.02370844 ms without it. I don't think that's a significant difference.
  • @micha No. It is not the same. It's a non-capturing group. My regex only matches example (without the leading spaces), but your example includes leading spaces
  • This regex will match any preceding spaces. eg this is an[ example]. (square brackets represent a match). Just because it is in a non-capturing group, doesn't mean it isn't matched. It just means it isn't captured in a group which would normally be captured in normal brackets. The right way to do this would be using \K like @Leon said
  • This doesn't work. Leading spaces are included in the match. Just copy and paste it in regex101.com.
  • It's only the lookbehind that's problematic. Lookahead can be anything in all regex engines that support it.