Ruby: How to split regular expressions over multiple lines?

ruby split regex
regex multiline
ruby concatenate regex
ruby r regular expression
regex multiline flag ruby
ruby regex ignore newlines
rust multi-line regex
rspec multiline regex

I have a 141 characters long regular expression in my Rails application and Rubocop doesn't like it.

My regular expression:

URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/

This pattern checks for urls & one level path e.g. http(s)://example.com/path

  1. Can you safely split a regular expression in Ruby? What is the general mechanism for splitting a regular expression in Ruby?

  2. How do you tell Rubocop to take it easy on regular expressions?

Thanks a lot!

You should try something like this:

regexp = %r{\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+
            ([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[\w.]+)\z}x

if 'http://example.com/path' =~ regexp
  puts 'matches'
end

The "x" at the end is to ignore whitespace and comments in the pattern.

Check the ruby style guide last example https://github.com/github/rubocop-github/blob/master/STYLEGUIDE.md#regular-expressions

Ruby: How to split regular expressions over multiple lines?, That can be done by escaping them ( \ ), putting each in a character class ( [ ] ) or by writing \p{Space} , [[:space:]] or \s . All but the first two match any whitespace character--a space, tab, newline and a few others--which may or may not be wanted. One of the most basic string manipulationactions is to split a string into multiple sub-strings. This would be done, for example, if you have a string like"foo, bar, baz"and you want the three strings "foo", "bar", and "baz". The splitmethod of the String class can accomplish this for you. The Basic Usage of "Split"

How do you tell Rubocop to take it easy on regular expressions?

The cop that is complaining about this is likely Metrics/LineLength. There is no configuration option to ignore regular expressions, but you can inline disable it if you are okay with the regexp being that long:

# rubocop:disable Metrics/LineLength
URL_REGEX = /\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
# rubocop:enable Metrics/LineLength

It is also possible to put just a trailing rubocop:disable at the end of the line, but since the line is already very long, it could easily be missed, so the enable-disable combo might be better here.

Ruby: How to split regular expressions over multiple lines?, Ruby: How to split regular expressions over multiple lines? Vis Team March 30, 2019. I have a 141 characters long regular expression in my Rails application� Often we need to handle CSV files. We first use the IO.foreach iterator to easily loop over the lines in a text file. Each line must be chomped to remove the trailing newline. Then: We use split() on the commas. The parts between the comma chars are returned in an array. Output: The program writes the contents of the Array returned by split. It also prints the length of that Array.

Yes. you can create parts of regexes, and use them within the final regex you want.

prefix = %w(http://www. https://www. https://)
prefix = Regexp.union(*prefix.map{|e| Regexp.escape(e)})
letters = "[a-z\d]+"
URL_REGEX = /\A(#{prefix})?#{letters}([-.]#{letters)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-.\w]+)\z/

Split Ruby regex over multiple lines, This might not be quite the question you're expecting! I don't want a regex that will match over line-breaks; instead, I want to write a long regex that, for� When String#split matches a regular expression, if what you really want to do is iterate over a string, Using the Command Line to Run Ruby Scripts. Using Rack.

Another option would be to use a more concise regex. There are several places where you are repeating patterns when you don't need to.

/\A(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)\z/
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   (https?:\/\/(www.)?)?

With that and a few more alterations, I got your regex down to:

/^(https?:\/\/(www.)?)?[-a-z0-9.]+\.[a-z]{2,5}(:[0-9]{1,5})?(\/[-\w.]+)$/

It's not exactly equivalent, but here's my test.

Formatting a long regex: can a character class [] be split over lines , Hello, i am wandering if it is possible to split a character class ([…]) in Ruby regex over multiple lines. I know that the /x option allows to ignore� Formatting Long Regular Expressions. Complex Ruby regular expressions can get pretty hard to read, so it will be helpful if we break them into multiple lines. We can do this by using the 'x' modifier. This format also allows you to use comments inside your regex. Example: LOG_FORMAT = %r{ (\d{2}:\d{2}) # Time \s(\w+) # Event type \s(.*)

This elaborates @Gacha's answer. Yes, free-spacing mode (/x) is what you want. The regex parser removes all spaces before constructing the regular expression. As such, you do have to protect all space characters in the regex. That can be done by escaping them (\), putting each in a character class ([ ]) or by writing \p{Space}, [[:space:]] or \s. All but the first two match any whitespace character--a space, tab, newline and a few others--which may or may not be wanted.

The additional benefit of using free-spacing mode is that you can make the regex self-documenting.

Here you might write the following:

URL_REGEX = 
  /
  \A
  (               # open cap group 1
    https?:\/\/   # match 'http:\/\/' or 'https:\/\/'
    (?:www\.)?    # optionally match 'www.' in non-cap group
  )?              # close cap group 1 and optionally match it
  [a-z0-9]+       # match >= 1 lowercase letters or digits
  (               # open cap group 2
    [-.]          # match '-' or '.' ('{1}' not needed and no
                  # need to escape '-' or '.' in a char class)
    [a-z0-9]+     # match >= 1 lowercase letters or digits 
  )*              # close cap group 2 and match it >= 0 times
  \.              # match a period
  [a-z]{2,5}      # match 2-5 lowercase letters
  (:[0-9]{1,5})?  # optionally match ':' followed by 1-5 
                  # digits in cap group 3
  (               # open cap group 4
    \/            # match '\/'
    [-\w.]+       # match '-', word char or '.' 1 >= 1 times
  )               # close cap group 4
  \z              # match end of string
  /x              # free spacing regex definition mode

You'll see that I've made a few changes to simplify your regex. Note that forward slashes to the right of # must be escaped

Make your regular expressions more readable, The following two hints are taken from Github's Ruby style guide: If your regular expression mentions a lot of forward slashes, you can use %r(. If you would like to match whitespace characters, you have to escape them if they are not contained in a Matching line feeds with regular expressions works differently in every� When split () is applied by default, it is split not only by line breaks but also by spaces. print(s_lines_multi.split()) # ['1', 'one', '2', 'two', '3', 'three'] source: str_split_rsplit.py. Since only one newline character can be specified in sep, it can not be split if there are mixed newline characters.

Safe Multi-Line Regular Expressions in Ruby, If you're programming in Ruby and didn't know that it has regular expressions that are multi-line by default, chances are you've written unsafe� Python Regular Expression: Exercise-47 with Solution. Write a Python program to split a string with multiple delimiters. Note : A delimiter is a sequence of one or more characters used to specify the boundary between separate, independent regions in plain text or other data streams.

Multiline mode of anchors ^ $, flag "m", Searching at line start ^. In the example below the text has multiple lines. The pattern /^\d/gm takes a digit from the� On each line, in the leftmost column, you will find a new element of regex syntax. The next column, "Legend", explains what the element means (or encodes) in the regex syntax. The next two columns work hand in hand: the "Example" column gives a valid regular expression that uses the element, and the "Sample Match" column presents a text string

Split Ruby regex over multiple lines, Match newline `\n` in ruby regex, puts will always return nil . Your code should work fine, albeit lengthy. =~ returns the position of the match which is 0. You could� When attempting to build a logical “or” operation using regular expressions, we have a few approaches to follow. Fortunately the grouping and alternation facilities provided by the regex engine are very capable, but when all else fails we can just perform a second match using a separate regular expression – supported by the tool or native language of your choice.

Comments
  • @Jörg I don't think this is a duplicate of that question. The title suggests it is duplicate, but actually, the question asks for alternative ways to handle such long regexes.
  • This is what I was looking for. Would you be able to edit with the regular expression from above to make it more obvious? Can you break the line anywhere at all?
  • This is pretty great! Thank you.
  • That's pretty useful, only requires extra variables and doesn't seem to be a good fit for constants. But maybe I'm wrong? @sawa
  • I figured in my case URL_REGEX is a class constant. If I wanted to assign parts of the regular expressions I'll want to create a few additional constants.
  • I matched 100% of OP's examples: /^http(s)?://example.com/path$/
  • @Rogue: Heh. I think you will find the OP only gave one example that's matched by: /^http\(s\):\/\/example\.com\/path$/. But point taken.
  • There wasn't actually a point I just felt like being cheeky :p
  • It's perfect, one of these days I'll set some time aside to study regular expressions in depth. Thanks!
  • This is the first answer that got at what tripped me up: if a space is a significant part of the pattern to be matched, escape those spaces before using the x modifier or you'll change your pattern. Good answer!
  • @DanielDoherty, thanks for reminding me about simply escaping the space. I've done an edit.