Including optional character in regex with sed

sed regex
sed regex tester
sed '/regex cheat sheet
sed special characters
sed extended regex
sed regex match
sed regex whitespace
sed regex digit

I have the following strings:

setenv run_area1 root/test1/Apr14_2019_10_32_39/dummy
setenv area2 root/test2/Aug23_2017_14_25_56/dummy
setenv run_area3 testRun/test1/blue_Apr14_2019_08_56_48/dummy/
setenv area4 testRun/test2/Aug23_2017_14_26_03/thing2

I want to replace the Date with [DATE] as following:

setenv run_area1 root/test1/[DATE]/dummy
setenv area2 root/test2/[DATE]/dummy
setenv run_area3 testRun/test1/blue[DATE]/dummy/
setenv area4 testRun/test2/[DATE]/thing2

I have to use sed so I wrote the following command:

sed 's|[A-Z][a-z]*[0-9]*_[0-9]*_[0-9]*_[0-9]*_[0-9]*|[DATE]|g'

It works good for the strings but for the following one:

setenv run_area3 testRun/test1/blue_Apr14_2019_08_56_48/dummy/

I get:

setenv run_area3 testRun/test1/blue_[DATE]/dummy/

I'm looking for a way to use the _ in the regex. In perl I know that I can use something like (_|) so _ is optional. Also I could use ?. For previous threads I saw that the basic sed does not includes those options and I need to use the \{0,1\}. (link). The problem is, I can't seem to understand how \{0,1\} solves it. Are there other solutions?

\{0,1\} in a BRE is a regexp interval that means 0 to 1 repetitions of the preceding expression which is the same as ? means in an ERE (technically in an ERE it's defined as 0 _or_ 1 but that's the same set of values!) i.e. that the preceding expression is optional.

With any POSIX sed:

$ sed 's/_\{0,1\}[[:upper:]][[:lower:]]*[0-9]*\(_[0-9]*\)\{4\}/[DATE]/' file
setenv run_area1 root/test1/[DATE]/dummy
setenv area2 root/test2/[DATE]/dummy
setenv run_area3 testRun/test1/blue[DATE]/dummy/
setenv area4 testRun/test2/[DATE]/thing2

Why doesn't the '?' regex character produce a match in sed?, By default, sed uses BRE and would need -E or -r option to use ERE. Quoting from GNU sed manual. In GNU sed the only difference between� Lesson 1: An Introduction, and the ABCs Lesson 1½: The 123s Lesson 2: The Dot Lesson 3: Matching specific characters Lesson 4: Excluding specific characters Lesson 5: Character ranges Lesson 6: Catching some zzz's Lesson 7: Mr. Kleene, Mr. Kleene Lesson 8: Characters optional Lesson 9: All this whitespace Lesson 10: Starting and ending Lesson

If the month and data follows MMMDD format, you could consider such an expression unique in the record and base your entire script on that assumption. Somewhat like below:

sed -E 's/^(.*)([[:alpha:]]{3}[[:digit:]]{2})([^/]+)\/(.*)$/\1[DATE]\/\4/;s/_\[DATE\]/[DATE]/' filename

Output

setenv run_area1 root/test1/[DATE]/dummy
setenv area2 root/test2/[DATE]/dummy
setenv run_area3 testRun/test1/blue[DATE]/dummy/
setenv area4 testRun/test2/[DATE]/thing2

Note: The -E option with sed enables extended regex, which if not supported, use the -r option.

Regular Expressions - sed, a stream editor, Most characters are ordinary : they stand for themselves in a pattern, and match the The power of regular expressions comes from the ability to include� You can write a regular expression that matches many alternatives by including more than one question mark. Feb (ruary)? 23 (rd)? matches February 23rd, February 23, Feb 23rd and Feb 23. You can also use curly braces to make something optional. colo u {0,1} r is the same as colo u? r. POSIX BRE and GNU BRE do not support either syntax.

Because sed uses the basic regular expression by default. For the difference between basic regular expression and extended regular expression, please refer to this link.

If you want to use features supported by extended regular expression. You have to tell sed explicitly with the -r option.

So with GNU sed the scripts below do the same thing actually.

sed 's|_\?[A-Z][a-z]*[0-9]*_[0-9]*_[0-9]*_[0-9]*_[0-9]*|[DATE]|g' textfile

sed -r 's|_?[A-Z][a-z]*[0-9]*_[0-9]*_[0-9]*_[0-9]*_[0-9]*|[DATE]|g' textfile

sed: regular expressions optional patterns, sed: regular expressions optional patterns gives my expected result if I write the whole filename (including .bib) in the latex file, but dots in the name you can limit name section to [a-z] or whatever range of chars you wish. Bracket expressions can be used in both basic and extended regular expressions (that is, with or without the -E/-r options). Within a bracket expression, a range expression consists of two characters separated by a hyphen. It matches any single character that sorts between the two characters, inclusive.

use of alternation "|" in sed's regex, The "|" also needs a backslash to get its special meaning. echo "blia blib bou blf" | sed 's/bl\(ia\|f\)//g'. will do what you want. As you know, if all else fails, read the� The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of special characters, which do not stand for themselves but instead are interpreted in some special way. Here is a brief description of regular expression syntax as used in sed. char

RegexOne - Lesson 8: Characters optional, Another quantifier that is really common when matching and extracting text is the ? (question mark) metacharacter which denotes optionality. This metacharacter� By default, sed uses BRE and would need -E or -r option to use ERE Quoting from GNU sed manual. In GNU sed the only difference between basic and extended regular expressions is in the behavior of a few special characters: ‘?’, ‘+’, parentheses, braces (‘{}’), and ‘|’.

Regex Tutorial, You can make several tokens optional by grouping them together using You can write a regular expression that matches many alternatives by including more than one c matches the c in color, and o, l and o match the following characters . Inside the expression have 2 regular expressions. [[:space:]] - Match with all whitespace characters, including line breaks? - optional + - one or more times. SO, the command sed will delete every line that match with --updated and than one or more whitespace character and than Date but because of the ? the character d is optional. like:

Comments
  • Don't have time to test, but I would expect 's|_*[A-Z][a-z]*[0-9]*.......|....| to work. good luck.
  • FYI -E will work with OSX/BSD sed and newer versions of GNU sed so it's better than -r for portability. -r is only for GNU sed and is only required if you're using an older version that doesn't support -E.