Recursive regex for matching everything in parenthesis (PCRE)

recursive regex javascript
regular expression matching recursive
regex tester
regex nested brackets
regular expression balanced parentheses
javascript regex balanced parentheses
recursive regex c#
regex parentheses

I am surprised to not easily find a similar question with an answer on SO. I would like to match everything in some functions. The idea is to remove the functions which are useless.

foo(some (content)) --> some (content)

So I am trying to match everything in the function call which can include parenthesis. Here is my PCRE regex:

(?<name>\w+)\s*\(\K
(?<e>
     [^()]+
     |
     [^()]*
         \((?&e)\)
     [^()]*
)*
(?=\))

https://regex101.com/r/gfMAIM/1

Unfortunately it doesn't work and I don't really understand why.

Recursive Regex—Tutorial, Basically, (?R) means "paste the entire regular expression right here, patterns on the net, nearly all use it for the same purpose—to match nested parentheses. For instance, to paste the regex inside the Group 1 parentheses, you would use (?1) instead of (?R). Here is how our corrected anchored recursive pattern looks: Anchored recursive pattern: ^((\w)(?:(?1)|\w?)\2)$ Everything between the two anchors now lives in a set of parentheses. This is Group 1.

Regular Expression Recursion, RegexBuddy—Better than a regular expression tutorial! The regexes a(?R)?z, a(?0)?z, and a\g<0>?z all match one or more letters a followed by exactly the same A common real-world use is to match a balanced set of parentheses. Recursive patterns Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed depth of nesting. It is not possible to handle an arbitrary nesting depth.

I have simple regex without recursion.

(?<=[\w ]{2}\().*(?=\))

by now it deals with unbalanced perenthesis, but it does not deals with multiple functions that are on one line. It could be handeled if you know the delmiters between the function. e.g. ; if that is Java code.

Variant 2 (updated for multiple functions on a row):

(?<=[\w ]\()[^;\n]*(?=\))

Variant 3 (allowing ; in strings):

(?<=[\w ]\()([^;\n]|".*?")*(?=\))    

Variant 4 (escaping strings):

(?<=[\w \n]\()(?:[^;\n"]|(?:"(?:[^"]|\\")*?(?<!\\)"))*(?=\))

Recursive patterns - Manual, If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain The recursion in regular expressions is the only way to allow the parsing of  If the subject string contains unbalanced parentheses, then the first regex match is the leftmost pair of balanced parentheses, which may occur after unbalanced opening parentheses. If you want a regex that does not find any matches in a string that contains unbalanced parentheses, then you need to use a subroutine call instead of recursion. If you want to find a sequence of multiple pairs of balanced parentheses as a single match, then you also need a subroutine call.

Flagrant Badassery » Regex Recursion (Matching Nested Constructs), What's more, PCRE lets you either recurse the entire regex pattern, or a part of the pattern, i.e., the regex contained by a set of parentheses, referenced by its  I've seen some claims that recursive patterns can be used to match balanced parenthesis, but no examples using python's regex package (Note: re does not support recursive pattern, you need to use regex).

Match Nested Brackets with Regex: A new approach, You want to match a full outer group of arbitrarily nested parentheses with regex but Proof: Java Regex or PCRE on regex101 (look at the full matches on the right) That's great and all, but I want to match inner groups too! The most typical use of those in non-regular regex, is the use of a recursive pattern for parenthesis matching. A recursive regex like the following (that matches parenthesis) is an example of such an implementation: {((?>[^\(\)]+|(?R))*)} (this example does not work with python's re engine, but with the regex engine, or with the PCRE engine).

Recursive Regular Expression, A basic function in python to do this check of parentheses could look like this: PCRE 4.0 and later introduced regular expression recursion, this allow to re-​execute all or a part of the regular expression on the unmatched text. To use recursive 1 match a 3 or 4 characters palindrome. eg: 'kaak' or 'kak'. Recursive calls are available in PCRE (C, PHP, R…), Perl, Ruby 2+ and the alternate regex module for Python. Recursion of the Entire Pattern: (?R) To repeat the entire pattern, the syntax in Perl and PCRE is (?R) .

Comments
  • Is the inspected call of the function always at the start on the line or not?
  • @Predicate Let's assume everything is one one line, but you can find multiple function call on this line: foo(); bar()\n
  • is there a way that the parenthesis are not balanced in the desired match?
  • seems the reason why the regex suggested by OP didn't work was the * immediately after (?<e> group, adding uncatpured group inside fixed regex01
  • another solution more efficient to avoid backtracking is to use possessive quantifier for example (?<name>\w+)\s*\(\K(?<e>(?:[^()]++|\((?&e)\))*)\)
  • @NahuelFouilleul Yes, that is a bit better. Atomic group can be used, too (?<name>\w+)\s*\(\K(?<e>(?>[^()]+|\((?&e)\))*)(?=\)). Also, note the last ) must be in a lookahead (well, at least that is OP logic).
  • guys youre solution fails on unbalanced parenthesis. The could appear in strings.
  • @Predicate Those cases are out of scope, they cannot be matched with recursion. They can only be handled if specific context is known on both ends of expected matches. You can't have a universal regex for those scenarios.
  • check variant 2
  • You could improve the first lookbehind so it not only checks the one character, but the job becomes harder because PCRE does not allows Lookbehinds with varying length. Probably more context is needed fo that. But I dont see a problem in checking only one character before the (
  • And because * is greedy it will go the whole way to the last ) and not match ; also. Balancing problem will not occure and multiple functions on a line are allowed (if the delimiter is ;)
  • Keep it short and simple ^^
  • @NahuelFouilleul see Variant 3