RegEx for parsing "Diagnostic-Code" in a bounced e-mail

regex vs parser
regex parsing python
parse text with regex
regex parse alteryx
regular expression examples
regular expression match
bash regex tester
regex builder c#

I'm trying to read bounced e-mails by connecting via PHP to a IMAP account and fetching all e-mails. I'm looking to retrieve the "Diagnostic-Code" message for each e-mail and I wrote the following regex:

/Diagnostic-Code:\s+?(.*)/i

The message that I'm trying to parse is this:

Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does
    not exist. Please try 550-5.1.1 double-checking the recipient's email
    address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1
    https://support.google.com/mail/?p=NoSuchUser 63si4621095ybi.465 - gsmtp

The regex works partly meaning it only fetches the first row of text. I want to be able to fetch the entire message, so all the four rows of text.

Is it possible to update the expression to do this matching?

Thanks.

Parse Variable Patterns Using Regex, A grammar for regular expressions. In order to match against a regular expression, we first need to parse it. Parsing reveals the grammatical structure of the regular  R supports the POSIX standard for extended regular expressions (the default) as well as Perl-compatible regular expressions (PCRE, selected by adding perl = TRUE in calls to functions using regular expressions). Different engines can differ in how certain expressions are interpreted, especially when non-ASCII characters are involved.

Parsing regular expressions with recursive descent, Regular expressions (regex) is a very powerful tool and commonly used to parse text of many varieties. This is in itself not a bad thing, but it's  You can use both of them to your advantage!Many a times programmers try to make a SINGLE regular expression for parsing a text and then find it very difficult to maintain..You should use both as and when required. The REGEX engine is FAST.A simple match takes less than a microsecond.But its not recommended for parsing HTML.

Add the s flag:

/Diagnostic-Code:\s+?(.*)/si

From this question:

In PHP... [t]he s at the end causes the dot to match all characters including newlines.

This will allow your regex to match the whole thing (see this regex101). Just remember to add some way to end it if you have more text after that.

Beyond regular expressions: Robust parsing of text input, String Parsing and Regular Expressions. Removing a Percent Sign; Removing Grouping Characters; Separating City and State; Escaping Meta-Characters  To meet this challenge, we often use a pattern parsing language called Regex (which stands for Regular Expressions). Regex maybe the most popular language in the programming world. It is used in literally every high level programming language we know of in the world, including Visual Basic, C#, Javascript, Java, PHP, Perl, Ruby and dozens more.

String Parsing and Regular Expressions, Replacing a Complex Regular Expression with a Simple Parser. When we have to work with text, we often reach for regular expressions. Go in depth in understanding the structure of a URL or URI and see a single regular expression that can be used to extract the various pieces in one fell swoop. Parsing URLs with Regular Expressions and the Regex Object - Cambia Research

Replacing a Complex Regular Expression with a Simple Parser , Parsing @ Regular Expressions (RegEx / RegExp) Pattern @ Test http://www.​regexr.com/39424 @ Pattern /(\\d*\\.?\\d+)\\s?(px|em|ex|%|in|cn|mm|pt|pc+)/igm The tables are meant to serve as an accelerated regex course, and they are meant to be read slowly, one line at a time. On each line, in the leftmost column, you will find a new element of regex syntax. The next column, "Legend", explains what the element means (or encodes) in the regex syntax.

Units Parser, Regular expression tester with syntax highlighting, PHP / PCRE & JS Support, contextual help, cheat sheet, reference, and searchable community patterns. RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp). Supports JavaScript & PHP/PCRE RegEx. Results update in real-time as you type. Roll over a match or expression for details. Validate patterns with suites of Tests. Save & share expressions with others.

Comments
  • It's almost perfect :) It seems like for the following message "<x.y@z.com>: host gmail-smtp-in.l.google.com[1.1.1.1] said: 552-5.2.2 The email account that you tried to reach is over quota. Please direct 552-5.2.2 the recipient to 552 5.2.2 support.google.com/mail/?p=OverQuotaPerm u14si4562135ybj.341 - gsmtp (in reply to RCPT TO command)" the parsing stops at "gsmtp" and the "(in reply to RCPT TO command)" is left out.
  • It ought to be including that. What does your code look like?
  • I think it should stop when it reaches the beginning of a new part, meaning the "--" characters.
  • Not sure what you mean by "new part" or "--" characters. I had assumed "Diagnostic-Code:" was a message header. If you're trying to match something in the body, then using \n\s may not be the appropriate way to match continuation lines.
  • I'm not getting this information from the e-mail header, but from its body. So the parsing of the Diagnostic-Code message should stop when it reaches the "--" characters.
  • For me, this is returning all the content after Diagnostic-Code