matlab regexp, match any character X number of times

matlab regexp exact match
matlab regexp wildcard
matlab regexp tokens
matlab regexprep
matlab regexp cell array
matlab strfind
matlab contains
matlab extract number from string

I want to

  • match a but not b,
  • any character A-Za-z0-9 may be contained any number of times - except these characters \@\,\., which may only be contained once.
  • The overall matched string should be at least 3 characters long.

How do I achieve this? My approach doesn't work and I couldn't find the ref in the documentation for this.

a='em@il';
b='em@@l';
%more examples
a2='em@il, test.'; %<- correct
a3='email, test';  %<- correct
b2='em@il, test,'; %<- incorrect 2x ','
b3='em@ail, test. @bc. %<- incorrect 2x '@'
regexp({a,b},'[A-Za-z0-9 (\@\,\.){1}]{3,}','match','once')
ans =

  1×2 cell array

    {'em@il'}    {'em@@l'}

Regular Expressions - MATLAB & Simulink, You could use a negative lookahead with a backreference to make sure that the dot, comma or @ do not occur twice and use a character class  MATLAB parses each input character vector or string from left to right, attempting to match the text in the character vector or string with the first element of the regular expression. During this process, MATLAB skips over any text that does not match.

Maybe,

^(?!.*@.*@|.*,.*,|.*\..*\.)[A-Za-z0-9 @,.]{3,}$

would be close to what you have in mind, if I understand it right.

Demo

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

Source

MATLAB Regular Expressions

matlab regexp, match any character X number of times, The following tables list the regular expression syntax supported by MATLAB. This list is not Note Any character appearing in a regular expression is ordinary​, unless a '\' precedes it. Logical operators do not match any specific characters​. They are Matches the preceding element 0 times or 1 time, also minimizes. '\d*' matches any number of consecutive digits. \D. Any nondigit character; equivalent to [^0-9] '\w*\D\>' matches words that do not end with a numeric digit. \oN or \o{N} Character of octal value N '\o{40}' matches the space character, defined by octal 40. \xN or \x{N} Character of hexadecimal value N '\x2C' matches the comma character

Character Arrays (Strings) (Programming and Data Types), The matched pattern pat can include any of the standard regex operators, including: Match zero or more times For example, a template for a floating point number might be [-+. Implementation Note: For compatibility with MATLAB, escape sequences in pat (e.g., "\n" => newline) Alternatively, use (?-x​) in the pattern. Regular expressions provide a unique way to search a volume of text for a particular subset of characters within that text. Instead of looking for an exact character match as you would do with a function like strfind, regular expressions give you the ability to look for a particular pattern of characters.

Function Reference: regexp - Octave Forge, numbers that match a certain pattern). Matlab also allows you to use regular expressions with the regexpi matches a pattern (case insensitive i.e. A and a are the same) [a-z] matches ANY single character in that range (a,b,c,d,x,y,z). To match any number from 1 to 9, regular expression is simple /[1-9]/ Similarly you may use /[3-7]/ to match any number from 3 to 7 or /[2-5]/ to match 2,3,4,5. Regex for 0 to 10. To match numbers from 0 to 10 is the start of a little complication, not that much, but a different approach is used. This series is broken in to two components. 1

[PDF] 1 Pattern Matching- Regular Expressions, Many applications and programming languages have their own implementation It is the most basic pattern, simply matching the literal text regex. any character that is not in the character class. q[^x] matches qu in question. The asterisk or star tells the engine to attempt to match the preceding token zero or more times. '\d*' matches any number of consecutive digits. \D. Any nondigit character; equivalent to [^0-9] '\w*\D\>' matches words that do not end with a numeric digit. \oN or \o{N} Character of octal value N '\o{40}' matches the space character, defined by octal 40. \xN or \x{N} Character of hexadecimal value N '\x2C' matches the comma character

Regular Expressions Quick Start, Matching simple expressions; Matching any character; Repeating expressions; Grouping expressions; Choosing one character from many; Matching the beginning or end of a line Regular expressions are a system for matching patterns in text data, which are widely For example, [abc] matches a, b, or c, but not x, y, or z. '\d*' matches any number of consecutive digits. \D. Any nondigit character; equivalent to [^0-9] '\w*\D\>' matches words that do not end with a numeric digit. \oN or \o{N} Character of octal value N '\o{40}' matches the space character, defined by octal 40. \xN or \x{N} Character of hexadecimal value N '\x2C' matches the comma character

Comments
  • Can you provide some examples of strings that would and wouldn't match?
  • {1} is synonymous with simply not typing it.
  • For @, ,, and .: do they only invalidate the match if they each appear more than once, or would any combination of a single number of each invalidate the match? i.e. would aaa@b,c be invalid (because it contains two of these characters)? Or just something like aaa@b@c?
  • @CAustin eggcelent question - they don't invalidate each other, I simply want to count the occurences within the 'any' square brackets.
  • @NickReed updated
  • comment on solution choice: this is easiest to read and expand, all of the solutions should be accepted (upvoted all). Thanks to everyone, it made me realize the potential of negative lookaheads
  • Regex doesn't check for \. and will match strings with two or more . in them. To be fair, OP doesn't specify this in his examples, but does include it in the requirements. "[...] except these characters \@\,\., which may only be contained once."