Regular Expressions: How to get the effect of an AND THEN operator in compound expression?

Related searches

I'm struggling to work with regular expressions. I think I understand the individual expressions but combining something together has me completely stumped. I don't grasp the use of something equivalent to an AND operator to connect the pieces I want together into a "full" match expression.

For example, I'd like to split a string into an array breaking on any values of <1> to <57> and </1> to </57>.

So, I thought I needed something like:

( '<' or '<\/' ) and ( [1-9] or [1-4][0-9] or [5][0-7] ) and '>'

I can get separately <[1-4][0-9]> to work or </[1-4][0-9]>, but when combined together with a '|' it returns partial matches or undefined in between full matches.

Could you please tell me what I am not understanding? Attached is my example.

If click 'Try' for the first expression, it produces empty values after each <21> or </21>. This prints as undefined in the console.log when I test it. The second expression produces < and </ after each tag. I don't understand this, let alone how to get the fuller expression earlier in this question converted into a regExp.

The desired output is:

'This is a', '<21>', 'test', '<\/21>', '.'

Thank you.

ADDITION After receiving Georg's answer to this question, I became interested in finding a method of escaping these tags, especially since there is not currently supported a negative look-back except in Chrome only. By that I mean \<21> would be treated as regular text and not generate a split of the string at that point. If you are interested in something similar, you may likely find the answer to my follow-up question provided by Revo here quite helpful.

let b, B = document.querySelectorAll('button');

for ( b of B ) b.addEventListener( 'click', split_str, false );

function split_str( evt )
 {
   let e = evt.currentTarget,
       r = new RegExp( e.previousElementSibling.value ),
       s = e.parentNode.previousElementSibling.value;
   e.parentNode.lastElementChild.textContent = s.split(r);   
 }
div > div  { border: 1px solid rgb(150,150,150); width: 500px; height: 200px;padding: 5px; }

input { border: 1px solid rgb(150,150,150); width: 500px; margin-bottom: 20px; padding:5px; }
<input type='text' value="This is a<21>test</21>.">

<div>

<input type='text' value="(<[1-4][0-9]>)|(<\/[1-4][0-9]>)"> <button>try</button>

<input type='text' value="((<|<\/)[1-4][0-9]>)"> <button>try</button>

<div></div>

</div> 

Ok, let's start with the number thingy. It's fine, except there's technically no need to bracket a single symbol [5]

 [1-9] | [1-4][0-9] | 5[0-7]

(using spaces here and below for clarity).

For the first part, an alteration like a | ab reads better when written as ab?, that is, "a, and then, optionally, b`. That gives us

 < \/ ?

Now, the "and" (or rather "and then") operator you were looking for, is very simple in the regex language - it's nothing. That is, a and then b is just ab.

However, if we combine both parts simply like this

a  x | y | z

that would be a mistake, because | has low priority, so that would be interpreted as

ax | y | z

which is not what we want. So we need to put the number thing in parens, for the reasons that will be explained below, these parens also have to be non-capturing:

<\/?  (?: [1-9] | [1-4][0-9] | 5[0-7] )

This matches our delimiters, but we also need everything in between, so we're going to split the input. split normally returns an array of strings that do not match the delimiter:

"a,b,c".split(/,/) => a b c

If we want to include the delimiter too, it has to be placed in a capturing group:

"a,b,c".split(/(,)/) => a , b , c

so we have to wrap everything in parens once again:

(  <\/?  (?: [1-9] | [1-4][0-9] | 5[0-7] )  )

and that's the reason for ?: - we want the whole thing to be captured, but not the number part.

Putting it all together seems to do the trick:

s = "This is a<21>test</21>."


console.log(s.split(/(<\/?(?:[1-9]|[1-4][0-9]|5[0-7])>)/))

javascript, Regular Expressions: How to get the effect of an AND THEN operator in an AND operator to connect the pieces I want together into a "full" match expression. Combine the greater and less expressions in an and expression. Use the greater expression to identify the employees who have paid less than the full amount due and use the less expression to determine if the payment due date is less than one day away from the current date.

You've almost got it. It's really as simple as replacing 'or' with | and replacing and with concatenation. Then make sure your groups are unmatching by adding ?: to the beginning of each:

(?:<|<\/)(?:[1-9]|[1-4][0-9]|[5][0-7])>

MDN has an explanation on the interaction of split and regex. But the short example-explanation is:

'hi_joe'.split('_'); // ['hi', 'joe']
'hi_joe'.split(/_/); // ['hi', 'joe']
'hi_joe'.split(/(_)/); // ['hi', '_', 'joe']
'hi_joe'.split(/(?:_)/); // ['hi', 'joe']

Update per comment, if you'd like the <##> in your results array as well, wrap the regex in an additional set of parens.

((?:<|<\/)(?:[1-9]|[1-4][0-9]|[5][0-7])>)

Regex - Common Operators, Most operators have more than one representation as characters. The result is a regular expression that will match a string if a matches its first for what it operates on, how some syntax bits affect it, and how Regex backtracks to match it . When attempting to build a logical “or” operation using regular expressions, we have a few approaches to follow. Fortunately the grouping and alternation facilities provided by the regex engine are very capable, but when all else fails we can just perform a second match using a separate regular expression – supported by the tool or native language of your choice.

The way I understand regex is that, unless specified otherwise intentionally e.g. an OR clause, everything you define as a regex is in the form of an AND. [a-z] will match one character, whereas [a-z][a-z] will match one character AND another character.

Depending on your use case the regex below could be what you need. As you can see it captures everything between <number></number>.

<[1-5][0-9]>([\s\S]*?)<\/[1-5][0-9]>

<[1-5][0-9]> matches <number> where number is between 00 and 59.
[\s\S]*? matches every single character there is, including new lines, between zero and unlimited times.
</[1-5][0-9]> matches </number> where number is between 00 and 59.

Here is a snippet returning everything between <number></number>. It converts the matches to an array and gets the first capture group of the first match. The first capture group being everything between <number></number> as you can see by the parenthesis in the regex itself.

let str = '<10>Hello, world!</10>';

let reg = /<[1-5][0-9]>([\s\S]*?)<\/[1-5][0-9]>/g;

let matches = Array.from( str.matchAll(reg) );

console.log(matches[0][1]);

Conditional Regular Expressions—from 101 to Advanced, The regex conditional is an IF…THEN…ELSE construct. Its basic form is this: These various kinds of assertions are expressed by small variations in the conditional syntax. To achieve the same effect without a conditional, we could use� Regular Expressions. A regular expression, or regexp, is a way of describing a set of strings. Because regular expressions are such a fundamental part of awk programming, their format and use deserve a separate chapter. A regular expression enclosed in slashes (`/') is an awk pattern that matches every input record whose text belongs to that set.

Expressions and operators, This chapter describes JavaScript's expressions and operators, A unary operator requires a single operand, either before or after the operator: There are also compound assignment operators that are shorthand for the syntax is a JavaScript expression that makes it possible to extract data from arrays� Compound regular expressions Each regexp entity corresponds to a single pattern, but you can provide multiple regular expressions if they all represent variations of a single pattern. During agent training, all regular expressions of a single entity are combined with the alternation operator ( | ) to form one compound regular expression .

6.2. re — Regular expression operations — Python 3.3.7 , Regular expressions use the backslash character ('\') to indicate special forms or to might have to write '\\\\' as the pattern string, because the regular expression if A and B are both regular expressions, then AB is also a regular expression. stand for classes of ordinary characters, or affect how the regular expressions� Less than operator Performs a string-to-integer conversion. If the string-to-integer conversion fails, the value is treated as 0. After the conversion, the operator will compare the two values and return true only if the left side is less than the right side of the operator. > Greater than operator

A regular expression is a pattern that is used to find substrings in text. Groovy supports regular expressions natively using the ~”regex” expression. The text enclosed within the quotations represent the expression for comparison. For example we can create a regular expression object as shown below − def regex = ~'Groovy'

Comments
  • a good resource for building regex is regexr.com. Play with it.
  • given This is a<21>test</21>, what is the expected output? How about <21>test</51>?
  • Maybe you are looking for something like this: /<\/?[1-9]>|<\/?[1-4][0-9]>|<\/?[5][0-7]>/
  • @georg Right, I should l have included the desired output. I added it now. Thank you.
  • @Shidersz Thanks. I wasn't clear. I need what you have plus the complete tags of '<21>' and '<\/21>'. I should have been clear int he question.
  • Thank you very much for taking the time to explain all of this to me. I had read over a number of resources on regExp but none addressed the 'and then' concept or, more likely, I simply failed to properly recognize it. This works exactly as I need it to and after a little more study of capturing and non-capturing I likely will understand it, too. This was holding my work back; so, I greatly appreciate your help.
  • Thanks for responding to my question. I would have thought so too but it doesn't work. The expression you provided returns '<' , '21', '</' which are not needed and never a complete tag like '<21>'. I don't understand why.
  • added ?: to address that.
  • I apologize for not being clear in the question. I need the tags <21> and <\ /21> to be part of the array.
  • @Gary, in that case, encapsulate the entire regex in an additional set of parens.
  • Thanks. The simplification works but still didn't produce the desired result and the added parentheses didn't either. Tricky little thing. I keep messing with the parenthesis and get interesting results, such as more or less splits. I just don't get the matching concept. It appears that the results are either what is termed greedy or not greedy because I get every individual match, such a <21>, <, 21, >, and </, or just the first '<' tag only of each tag.