Extract (repeating) groups containing parentheses using regex

regex repeat pattern n times
regex capture group example
regex capture group multiple times
regex match between parentheses
regex non capturing group
regex match pattern multiple times
regex parentheses
regex groups python

My string :

(01) this is value one (02) and this is 2 (03) and this is number 3

Desired result (key/value pair):

(01)    this is value one  
(02)    and this is 2   
(03)    and this is number 3

My code so far:

$s="(01) this is value one (02) and this is 2 (03) and this is number 3" 
$pattern  = '(\(\d\d\))(.*)' 
$m = $s | select-string $pattern -AllMatches | % {$_.matches} | ForEach-Object { $_.Groups[1].Value }

How to accomplish this?

Since you're looking for key-value pairs, it makes sense to collect them in a(n ordered) hashtable.

Splitting can be performed via the regex-based -split operator, which also allows including parts of what the separator regex matches in the output array, via capture groups ((...)).

# Input string
$s = '(01) this is value one (02) and this is 2 (03) and this is number 3'

# Initialize the output hashtable
$ht = [ordered] @{}

# Split the input string and fill the hashtable.
$i = 0; 
$s -split '(\(\d+\)) ' -ne '' | ForEach-Object { 
  if (++$i % 2) { $key = $_ } else { $ht[$key] = $_ }
}

# Output the hashtable
$ht

The above yields:

Name                           Value
----                           -----
(01)                           this is value one 
(02)                           and this is 2 
(03)                           and this is number 3

Note: If you don't want to include the enclosing (...) in the key (name) properties, use -split '\((\d+)\) ' instead of -split '(\(\d+\)) '

The above splits the string into the elements of an array in which pairs of adjacent elements represent key-value pairs. The ForEach-Object call then adds these key-value pairs to the output hashtable, deciding if the input element is a key or a value based on whether the element index is odd or even.


As for what you tried:

Your regex '(\(\d\d\))(.*)' is too greedy, meaning that a single match on a given line will match the entire line due to the .* sub-expression.

You'll get the desired matches if you use the following regex instead:'(\(\d+\)) ([^(]+)'

That is, after matching an index such as (01) only match up to but not including the subsequent (, if any.

In the context of a streamlined version of your original command, which outputs the key-value pairs as an array of custom objects ([pscustomobject] instances):

$s = '(01) this is value one (02) and this is 2 (03) and this is number 3'
$pattern  = '(\(\d+\)) ([^(]+)'
$s | Select-String $pattern -AllMatches | ForEach-Object {
  $_.matches | Select-Object @{ n='Name';  e = { $_.Groups[1].Value } },
                             @{ n='Value'; e = { $_.Groups[2].Value } }
}

The above yields:

Name Value
---- -----
(01) this is value one 
(02) and this is 2 
(03) and this is number 3

Do note, however, that the above outputs an array of custom objects that each represent a key-value pair, which differs from the solution in the top section, which creates a single hashtable containing all key-value pairs.

Extract (repeating) groups containing parentheses using regex, Extract (repeating) groups containing parentheses using regex. regex repeat pattern n times regex capture group example regex capture group multiple times Solution: Use the Java Pattern and Matcher classes, supply a regular expression (regex) to the Pattern class, use the find method of the Matcher class to see if there is a match, then use the group method to extract the actual group of characters from the String that matches your regular expression.

I was able to achieve your desired output with the following:

PS H:\> $pattern = '(\(\d\d\))([^(]*)'
PS H:\> $results = $s | Select-String $pattern -AllMatches
PS H:\> $results.Matches.Value
(01) this is value one
(02) and this is 2
(03) and this is number 3

Edit: Accessing match groups:

PS H:\> $results.Matches.Captures.Groups[0].value
(01) this is value one
PS H:\> $results.Matches.Captures.Groups[1].value
(01)
PS H:\> $results.Matches.Captures.Groups[2].value
 this is value one
PS H:\> $results.Matches.Captures.Groups[3].value
(02) and this is 2
PS H:\> $results.Matches.Captures.Groups[4].value
(02)
PS H:\> $results.Matches.Captures.Groups[5].value
 and this is 2

Capturing groups, Parentheses group characters together, so (go)+ means go , gogo , gogogo and so on. As we can see, a domain consists of repeated words, a dot after each one The search works, but the pattern can't match a domain with a The method str.match(regexp) , if regexp has no flag g , looks for the first  Solution: Use the Java Pattern and Matcher classes, and define the regular expressions (regex) you need when creating your Pattern class. Also, put your regex definitions inside grouping parentheses so you can extract the actual text that matches your regex patterns from the String.

here's an alternate method that uses string methods instead of regex. it also stores the output in an ordered hashtable. the [ordered] is merely for convenience - i wanted the display to be in sequence so that i could confirm the output was as expected.

rewrote the "blank items" filter to use Where-Object instead of .Where() since the OP is on a pre-v4 version of PoSh.

# fake reading in a text file
#    in real life, use Get-Content
$InStuff = @'
(01) this is value one (02) and this is 2 (03) and this is number 3
(01) One Bravo (03) Three Bravo
(02) Two Charlie
(111) OneThrice Delta (666) Santa Delta
(01) One Echo (03) Three Echo (05) Five Echo
'@ -split [environment]::NewLine

$LookupTable = [ordered]@{}

foreach ($IS_Item in $InStuff)
    {
    # OP cannot use the ".Where()" array method - that was added in ps4
    #foreach ($Split_Item in $IS_Item.Split('(').Where({$_}))
    $Split_ISI = $IS_Item.Split('(') |
        # this gets rid of the empty items
        Where-Object {$_}

    foreach ($SI_Item in $Split_ISI)
        {
        $Key = $SI_Item.Split(')')[0].Trim()
        $Value = $SI_Item.Split(')')[1].Trim()
        # the leading comma forces the input to be an array
        $LookupTable[$Key] += ,$Value
        }
    }

$LookupTable | Out-Host

$LookupTable['01'][0] | Out-Host
$LookupTable['02'][1] | Out-Host

output ...

Name                           Value
----                           -----
01                             {this is value one, One Bravo, One Echo}
02                             {and this is 2, Two Charlie}
03                             {and this is number 3, Three Bravo, Three Echo}
111                            {OneThrice Delta}
666                            {Santa Delta}
05                             {Five Echo}


this is value one
Two Charlie

the main gotcha here is that the lookup key MUST be a string, so the digits must be quoted for a direct lookup - '01' instead of 01.

Regex Tutorial, In a regular expression, parentheses can be used to group regex tokens together and define a character class, and curly braces are used by a quantifier with specific limits. Capturing groups make it easy to extract part of the regex match. For instance, the regex \b(\w+)\b\s+\1\b matches repeated words, such as regex regex, because the parentheses in (\w+) capture a word to Group 1 then the back-reference \1 tells the engine to match the characters that were captured by Group 1.

Repeating a Capturing Group vs. Capturing a Repeated Group, Repeating a capturing group in a regular expression is not the same as capturing a repeated Now let's say that the tag can contain multiple sequences of abc and 123, like !abc123! or !123abcabc! The engine proceeds with !, which matches ! Please make a donation to support this site, and you'll get a lifetime of  Groups are numbered from left to right in order of opening parentheses, even if the groups are nested. The group with index 0 is a special group that contains the full match, as if the whole

Regex Capture Groups and Back-References, You place a sub-expression in parentheses, you access the capture with \1 or $1​… What could be easier? For instance, the regex \b(\w+)\b\s+\1\b matches repeated words, such as regex How do Capture Groups Beyond \9 get Referenced? contains() Test if pattern or regex is contained within a string of a Series or Index. Calls re.search() and returns a boolean: extract() Extract capture groups in the regex pat as columns in a DataFrame and returns the captured groups: findall() Find all occurrences of pattern or regular expression in the Series/Index.

Regex Tutorial—Regex Cookbook, To me, many of the "recipes" are a repeat of the same general concept. Suppose you wanted to extract the values for day, name and fruit from this string: But if you had to do it with regex, you could use this: content of the parentheses, and it is placed within a set of regex parentheses in order to capture it into Group 1. Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from left to right starting at the subroutine call.

Regular Expression Grouping, NET tutorial examines grouping constructs and their use in the . we matched groups of characters by surrounding them with parentheses. and pattern characters with a quantifier to find repeating or optional matches. For example, you can extract the text matched by any of the subexpressions using . For example, if I wanted to extract a numeric value which I know follows directly after a word or set of letters, I could use the regular expression “[a-zA-Z]+([0-9]+)" this matches the whole expression, but allows you to select the portion in the parentheses (called a substring).

Comments
  • do you want the key names to be (##) or ##? the 2nd will be simpler to use since it would not require handling the () ... [grin]
  • the 2nd. Just ##
  • thanks! that will be noticeably easier to deal with.
  • @Lee_Dailey : Can you provide me a sample?
  • yep! [grin] just added an alternate answer to the list of answers ...
  • My pleasure, @JohnDoe; glad it was helpful.
  • Many thanks. Can I split them in groups? To access them by group. e.g group[1]=(01) and group[2]=this is value one etc.
  • @JohnDoe I've updated my answer with a slightly different regex to work with your original code. That work?
  • Sorry for the misunderstanding but access them by group. e.g group[1]=(01) and group[2]=this is value one etc.
  • My bad, I see what you mean. Yeah, you can iterate through the groups, as you can see in my edit, the groups will be the whole match and then the individual captures.
  • Sorry but I've a different result using your code. the result for $results.Matches.Captures.Groups[1].value is (02) and this is 2 I'd like to end up with a dictionary - key-value pair for every line
  • I get an error when I run your code: Method invocation failed because [System.String] doesn't contain a method named 'Where'. At line:15 char:29 + foreach ($Split_Item in $IS_Item.Split('(').Where({$_})) + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : MethodNotFound
  • @JohnDoe - that indicates that $IS_Item.Split('(') did not produce the expected array, OR that you are using a version of PS that doesn't support .Where(). [1] what is in $IS_Item at that point? [2] what version of PoSh are you running?
  • $IS_Item contains (01) this is value one (02) and this is 2 (03) and this is number 3 and PoSh version is: PSVersion 3.0, CLRVersion 4.0.30319.42000, BuildVersion 6.2.9200.22198