Regular Expression - split line into more with prefix in string

python regex match word in string
regex to extract words from string python
python extract substring from string
python regular expression
regex python extract from string
python create regex from string
python regex get substring from string
python convert string to regex

I am trying to use sed to get this string:

1234    dog, hat, cat

into this multi-line string:

1234 dog
1234 hat
1234 cat

It is just about replacing the comma by a new-line but putting the numbers after each new line. I am trying to do this with a sed command. My problem is that I want to catch the numbers and do a backreference but without affecting the simple comma matching. I read about \K but I am lost. Can anyone find me a sed -E command that accomplishes this?

Sed does not know \d and such. You need to use [0-9], \{0,1\}, \+... also if you may have different number of "words" then awk is much better with for, as it was suggested.

sed 's/^\([0-9]\+\)[\ \t]\+\([^,]\+\),[\ \t]\+\([^,]\+\),[\ \t]\+\([^,]\+\)/\1 \2\n\1 \3\n\1 \4\n/g'

regex - Regular Expression, into this multi-line string: 1234 dog 1234 hat 1234 cat. It is just about replacing the comma by a new-line but putting the numbers after each new line. I am trying​  For completeness, the third encoding (#3) is included showing the same regular expression method in #2, but in a LINQ coding style. If you are struggling with regular expressions, I highly recommend Mastering Regular Expressions, Third Edition by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding regular expressions and later

It is easier done using awk:

s='1234    dog, hat, cat'

awk -F '[, ]+' '{for(i=2; i<=NF; i++) print $1, $i}' <<< "$s"
1234 dog
1234 hat
1234 cat

String Manipulation and Regular Expressions, In addition, it is possible to define multi-line strings using a triple-quote syntax: To go one step further and replace a given substring with a new string, you can Here we've first compiled a regular expression, then used it to split a string. (?P​<suffix>[a-z]{3})') match = email4.match('guido@python.org') match.groupdict(). The iterator p is constructed using the regular expression and an input string. Once that has been built, you can treat p like you would an iterator on a standard library sequence. A sregex_token_iterator constructed with no arguments is a special value that represents the end

This might work for you (GNU sed):

sed -r 's/ +/ /g;/\n/!s/^((\S+)\s*[^,]*),/\1\n\2/;P;D' file

Reduce multiple sets of spaces to one each. If the line does not contain any newlines, replace each , by a newline followed by the key, print, delete and repeat.

csplit -- split a text file according to criteria, For example, you can use csplit to break up a text file into chunks of ten lines If prefix causes a file name longer than NAME_MAX bytes an error occurs and line that contains a string matching the regular expression regexp. regexp is a  Split by regular expression: re.split() split() and rsplit() split only when sep matches completely. If you want to split a string that matches a regular expression instead of perfect match, use the split() of the re module.

Split a String, If you need more advanced control of splitting, choose the "Split by a Regular Expression" option. It allows you This example splits a string into pieces by using a regular expression. The regular We set the line width to 6 characters and display each substring on a new line. The last line Quickly add a suffix to a string. You can split the string into substrings using the following line of code: String[] result = speech.split("\\s"); More accurately, that expression will split the string into substrings where the substrings are separated by whitespace characters. For this example, it serves the purpose of breaking the string into words. The string \s is a regular expression that means "whitespace", and you have to write it with two backslash characters ("\\s") when writing it as a string in Java.

Practicing regular expressions: re.split() and re.findall(), Note: It's important to prefix your regex patterns with r to ensure that your a new line, but if you use the r prefix, it will be interpreted as the raw string "\n" - that is, or more spaces ( "\s+" ) and then use re.split() to split my_string on this pattern,  The Regex.Split methods are similar to the String.Split method, except that Regex.Split splits the string at a delimiter determined by a regular expression instead of a set of characters. The count parameter specifies the maximum number of substrings into which the input string can be split; the last string contains the unsplit remainder of the string.

FileCheck, If you'd like to use a different prefix (e.g. because the same input file is checking the --check-prefix argument allows you to specify one or more prefixes to match​. By default, FileCheck allows matches of anywhere on a line. The first check line matches a regex %[a-z]+ and captures it into the string variable REGISTER . regex − This is the delimiting regular expression. Return Value This method returns the array of strings computed by splitting this string around matches of the given regular expression.

Comments
  • I was trying this without success: sed -E 's/(\d)?\K?,/\n\1/g'
  • Thanks, zolo. This looks complicated and it shows me that this problem cannot be solved with sed alone. Especially if the comma elements vary. We need control structures. Thanks!
  • Thank you, anubhava. Yes, with program logic it is not so hard. I was not sure if that can be solved with regular expression power. Thank you for that awk example.
  • awk is indeed the best solution for this problem. Provided sed solution can break for many cases. If this answer worked out, you may mark the answer as accepted by clicking on tick mark on top-left of my answer.