Regex: Repeated capturing groups

Regex: Repeated capturing groups

regex capture group example
regex repeat pattern n times
regex non capturing group
javascript regex named group
regex capture group python
regex return group 1
regex repeating pattern
regex match pattern multiple times

I have to parse some tables from an ASCII text file. Here's a partial sample:

QSMDRYCELL   11.00   11.10   11.00   11.00    -.90      11     11000     1.212
RECKITTBEN  192.50  209.00  192.50  201.80    5.21      34      2850     5.707
RUPALIINS   150.00  159.00  150.00  156.25    6.29       4        80      .125
SALAMCRST   164.00  164.75  163.00  163.25    -.45      80      8250    13.505
SINGERBD    779.75  779.75  770.00  773.00    -.89       8        95      .735
SONARBAINS   68.00   69.00   67.50   68.00     .74      11      3050     2.077

The table consists of 1 column of text and 8 columns of floating point numbers. I'd like to capture each column via regex.

I'm pretty new to regular expressions. Here's the faulty regex pattern I came up with:

(\S+)\s+(\s+[\d\.\-]+){8}

But the pattern captures only the first and the last columns. RegexBuddy also emits the following warning:

You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations.

I've consulted their help file, but I don't have a clue as to how to solve this.

How can I capture each column separately?


In C# (modified from this example):

string input = "QSMDRYCELL   11.00   11.10   11.00   11.00    -.90      11     11000     1.212";
string pattern = @"^(\S+)\s+(\s+[\d.-]+){8}$";
Match match = Regex.Match(input, pattern, RegexOptions.MultiLine);
if (match.Success) {
   Console.WriteLine("Matched text: {0}", match.Value);
   for (int ctr = 1; ctr < match.Groups.Count; ctr++) {
      Console.WriteLine("   Group {0}:  {1}", ctr, match.Groups[ctr].Value);
      int captureCtr = 0;
      foreach (Capture capture in match.Groups[ctr].Captures) {
         Console.WriteLine("      Capture {0}: {1}", 
                           captureCtr, capture.Value);
         captureCtr++; 
      }
   }
}

Output:

Matched text: QSMDRYCELL   11.00   11.10   11.00   11.00    -.90      11     11000     1.212
...
    Group 2:      1.212
         Capture 0:  11.00
         Capture 1:    11.10
         Capture 2:    11.00
...etc.

How to capture multiple repeated groups?, Alternatively, expand your regex and let the pattern contain one capturing group per group you want to get in the result: ^([A-Z]+),([A-Z]+)  When creating a regular expression that needs a capturing group to grab part of the text matched, a common mistake is to repeat the capturing group instead of capturing a repeated group. The difference is that the repeated capturing group will capture only the last iteration, while a group capturing another group that’s repeated will capture all iterations. An example will make this clear.


Unfortunately you need to repeat the (…) 8 times to get each column separately.

^(\S+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)\s+([-.\d]+)$

If code is possible, you can first match those numeric columns as a whole

>>> rx1 = re.compile(r'^(\S+)\s+((?:[-.\d]+\s+){7}[-.\d]+)$', re.M)
>>> allres = rx1.findall(theAsciiText)

then split the columns by spaces

>>> [[p] + q.split() for p, q in allres]

Regex Capture Groups and Back-References, For instance, the regex \b(\w+)\b\s+\1\b matches repeated words, such as regex regex, because the parentheses in (\w+) capture a word to Group 1 then the  Rather, they repeatedly refer to Group 1, Group 1, Group 1…. If you try this regex on 1234 (assuming your regex flavor even allows it), Group 1 will contain 4 —i.e. the last capture. In essence, Group 1 gets overwritten every time the regex iterates through the capturing parentheses.


If you want to know what the warning is appearing for, it's because your capture group matches multiple times (8, as you specified) but the capture variable can only have one value. It is assigned the last value matched.

As described in question 1313332, retrieving these multiple matches is generally not possible with a regular expression, although .NET and Perl 6 have some support for it.

The warning suggests that you could put another group around the whole set, like this:

(\S+)\s+((\s+[\d\.\-]+){8})

You would then be able to see all the columns, but of course they would not be separated. Because it's generally not possible to capture them separately, the more common intention is to capture all of it, and the warning helps remind you of this.

Capturing groups, As we can see, a domain consists of repeated words, a dot after each one except the last one. In regular expressions that's (\w+\.)+\w+ :. Parentheses group together a part of the regular expression, so that the quantifier applies to it as a whole. Parentheses groups are numbered left-to-right, and can optionally be named with (?<name>). The content, matched by a group, can be obtained in the results: The method str.match returns capturing groups only without flag g.


Regex TIL- Repeating capture groups, Regex TIL- Repeating capture groups. June 19, 2016. I consider myself decent with regexes; this comes from my data science days, when I was doing a lot of  You can access captured groups in four ways: By using the backreference construct within the regular expression. The matched subexpression is referenced in the same By using the named backreference construct within the regular expression. The matched subexpression is referenced in the By using


RegEx how to make one group match multiple times, For some reason i cant make one group match multiple times: #include Array.au3 In the second pattern "(w)+" is a repeated capturing group  Backtracking information is discarded when a match is found, so there's no way to tell after the fact that the group had a previous iteration that matched abc. (The only exception to this is the .NET regex engine, which does preserve backtracking information for capturing groups after the match attempt.)


Regular Expression Reference: Capturing Groups and Backreferences, is when you wish to look for adjacent, repeated words in some text. The first part of the match could use a pattern that extracts a single word. Capturing group (regex) Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex. (abc) {3} matches abcabcabc. First group matches abc. YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES: YES