Add exception to complicated regex

Related searches

There is a very complex regular expression.

But I have a problem with it. The # and ++ characters are removed if there are letters after them.

Question: How to add an exception to current regex for (C++ and C# tokens)?

I've used the next regex:

import re

text = 'Must-have skills: -.Net programming experience; -2 years experience in C++; C#/.Net, C++/.Net, C./.Net.'
text = re.sub(r'[!,.:;—](?= |$)', ' ', text)
print(re.sub(r'(?i)(?:(?!\.net\b|\b-\b)[^\w\s])+(?=[^\w\s]*\b)', ' ', text))

And I've had the next result:

'Must-have skills   .Net programming experience   2 years experience in C++  C .Net  C .Net  C .Net '

Desired result:

'Must-have skills   .Net programming experience   2 years experience in C++  C# .Net  C++ .Net  C .Net '

Current regex details

  • (?i) - case insensitive mode on
  • (?:(?!\.net\b|\b-\b)[^\w\s])+ - any punctuation char ([^\w\s]), 1 or more occurrences, as many as possible, that does not start any of the sequences:
    • \.net\b - .net as whole word
    • | - or
    • \b-\b - a hyphen enclosed with word chars
  • (?=[^\w\s]*\b) - a positive lookahead that requires 0+ punctuation chars followed with a word boundary position immediately to the right of the current location.

regex, I want to add two exceptions to this matching rule. 1) if the ">" is preceded by "p", that is for example a <p> starting tag, to match the literal only. 2) Also the literal� Apex script unhandled exception by user/organization: Failed to invoke future method 'public static void PrepareCSV(String, String, String, Integer, Boolean)' caused by: System.Exception: Regex too complicated Class.futureClassToProcess.GetList: line 98, column 17 Class.futureClassToProcess.parseCSV: line 53, column 38 Class

It's not quite the same as your output but I was able to do this with only a difference of white space by reversing the order of the two re.subs and adding a negative lookbehind.

text = 'Must-have skills: -.Net programming experience; -2 years experience in C++; C#/.Net, C++/.Net, C./.Net.'
text = re.sub(r'(?i)(?:(?!\.net\b|\b-\b)(?<!C)(?<!C\+)[^\w\s])+(?=[^\w\s]*\b)', ' ', text)
text = re.sub('[!,.:;—](?= |$)', ' ', text)

Output:

print(text)
Must-have skills   .Net programming experience   2 years experience in C++  C# .Net  C++ .Net  C  .Net 

Exclude or filter files using file type and regular expressions (regex , The error you are getting [System.Exception: Regex too complicated] happens on two different events: 1. Your Matcher is too complex. As documented here:. How you handle the exception depends on the cause of the exception. If the exception occurs because the time-out interval is set too low or because of excessive machine load, you can increase the time-out interval and retry the matching operation.

Error 'Regex too complicated' in Apex, It seems like you're trying really hard to avoid untyped deserialization just because it's comfortable and easy to work with or because it feels� See RegEx syntax for more details. Use regex capturing groups and backreferences. You can put the regular expressions inside brackets in order to group them. Each group has a number starting with 1, so you can refer to (backreference) them in your replace pattern. Note that the group 0 refers to the entire regular expression.

Split string in equal length to avoid 'regex too complicated exception , The expression [a-zA-Z] also works. Basic Regular Expressions: Exclusions. Regular Expression (Delimit with quotes). String (� Best practices for regular expressions in .NET. 06/30/2020; 39 minutes to read +10; In this article. The regular expression engine in .NET is a powerful, full-featured tool that processes text based on pattern matches rather than on comparing and matching literal text.

Basic Regular Expressions: Exclusions, In .NET, the Regex class represents the regular expression engine. It can be used to quickly parse large amounts of text to find specific character patterns; to extract, edit, replace, or delete text substrings; and to add the extracted strings to a collection to generate a report.

In general, avoid using open-ended regex in rules. In certain cases, open-ended regex may be the most elegant solution. But carefully consider if there are other options. Examples: Rulesets with a lot of domains that we can catch with a simple regex that would be tedious and error-prone to list individually, like 360.cn.xml

Comments
  • I'm impressed you were able to get it all on same line and we posted within a minute of each other. The output is similar to mine where there are two spaces between 'C .Net' at the end of the line.
  • I think your solution might ultimately be better. It's more true to form and looks to be more scalable. I actually didn't see that extra space in mine so I'll see if I can come up with an update. Haha. Both for different scenarios I guess.
  • @CT Hal, thank you for your answer. It's really impressive. Also, I should have said that the number of spaces does not matter. The main thing to deal with punctuation marks.
  • @lemon, do you want me to post an explanation (which will be painful, but I will on request), or is it readable enough that it doesn't need one?
  • @FailSafe, That'll be great!