Matching multiple strings ending with

r partial string match
string matching in r
ends_with dplyr
grepl ends with
regex match multiple words in any order
contains multiple dplyr
r starts_with multiple strings
if string ends with in r

I'm looking for a more efficient way of matching multiple arbitrary domains from a block of text.

I have a block of text looks like this:

'''
    foo
    my.domain1
    batman.my.domain1
    superman.my.domain2 foo bar wonderwoman.my.domain1
'''

I want to match all subdomains of my.domain1 and my.domain2

Desired output here would be:

['batman.my.domain1', 'superman.my.domain2', 'wonderwoman.my.domain1']

I have accomplished the task partially by using this monster of a regex that surely can't be the most efficient way to do this:

r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,}[a-zA-Z0-9])?\.)+my.domain1|(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,}[a-zA-Z0-9])?\.)+my.domain2'

Is there a better way to do this?

Example code:

import re

text = '''
    foo
    my.domain1
    batman.my.domain1
    superman.my.domain2 foo bar wonderwoman.my.domain1
'''

pattern = r'(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,}[a-zA-Z0-9])?\.)+my.domain1|(?:[a-zA-Z0-9](?:[a-zA-Z0-9\-]{,}[a-zA-Z0-9])?\.)+my.domain2'
print(re.findall(pattern, text))

# Desired output is:
# ['batman.my.domain1', 'superman.my.domain2', 'wonderwoman.my.domain1']

P.S - my.domain1 and my.domain2 are example domains, real ones will not have numbers at the end.

endswith() comes to an aid. I respect regular expressions, but checking the domain to be a subdomain is so much strictly related to .endswith() logic, that I would avoid putting regex's for this purpose. And, btw, regex syntax is in 99% hardly readable comparing to plain strings.

accepted_domains = ['my.domain1', 'my.domain2']

text = '''
    foo
    my.domain1
    batman.my.domain1
    superman.my.domain2 foo bar wonderwoman.my.domain1
'''

result = []

for dom in text.lower().split():
    for acc_dom in accepted_domains:
        #if dom == acc_dom or dom.endswith('.' + acc_dom):  # if you want 'my.domain1' to be included
        if dom.endswith('.' + acc_dom):
            result.append(dom)

print(result)

Output:

['batman.my.domain1', 'superman.my.domain2', 'wonderwoman.my.domain1']

r dplyr ends_with multiple string matches, I want to select columns that end with 1 or 2, to have only columns a11 and a12. Is select(ends_with) the best way to do this? Thanks! share. Share a link to this  To be equal, value must be a reference to this same string, must be the empty string (""), or must match the end of this string. The type of comparison performed by the EndsWith method depends on the value of the comparisonType parameter.

Two improvements I can provide:

  1. Use \w for a shorthand for [A-Za-z0-9_] if you don't care about underscores

  2. Use (?:pattern1|pattern2) to "generalize" the ending.

import re

text = '''
    foo
    my.domain1
    batman.my.domain1
    superman.my.domain2 foo bar wonderwoman.my.domain1
'''

pattern = r'(?:\w+\.)+(?:my\.domain1|my\.domain2)'
print(re.findall(pattern, text))

If you want to match hyphens inside domain parts:

pattern = r'(?:\w(?:[\w-]?\w)*\.)+(?:my\.domain1|my\.domain2)'

This will match asdf-ghjkl.my.domain1 but not asdf--ghjkl.my.domain2 (no consecutive hyphens).

For consecutive hyphen sequences:

pattern = r'(?:\w(?:[\w-]*\w)?\.)+(?:my\.domain1|my\.domain2)'

14 Strings, Multiple strings are often stored in a character vector, which you can create with c​() : the regular expression so that it matches from the start or end of the string. Match strings ending in certain character. I am trying to get create a new variable which indicates if a string ends with a certain character. Below is what I have tried, but when this code is run, the variable ending_in_e is all zeros.

Operating under the assumption what you really want is a string with two periods in it...

>>> text = '''
    ...     foo
    ...     my.domain1
    ...     batman.my.domain1
    ...     superman.my.domain2 foo bar wonderwoman.my.domain1
    ... '''
    >>> data = [x for x in text.split() if x.count('.') ==2 and x.endswith(('2','1'))]
    >>> data
    ['batman.my.domain1', 'superman.my.domain2', 'wonderwoman.my.domain1']

Using Regex for Text Manipulation in Python, The above regex expression will match the text string, since we are trying to match In the above script, we tried to find if the text string ends with "1998", which is not You can group multiple patterns to match or substitute in a string using the  Use an end anchor ($):.*\.ccf$ This will match any string that ends with .ccf, or in multi-line mode, any line that ends with .ccf.

5.2. Find Any of Multiple Words, var subject = "One times two plus one equals three."; // Solution 1: var regex = /\b​(?:one|two|three)\b/gi; subject.match(regex); // Returns an array with four  Strings Ending with Multiple Line Breaks If a string ends with multiple line breaks and multi-line mode is off then $only matches before the last of those line breaks in all flavors where it can match before the final break. The same is true for \Zregardless of multi-line mode. Boost is the only exception.

Pattern-matching Conditions, Within y , the character % matches any string of zero or more characters except null. as the start and end, respectively, of any line anywhere in the source string, If you specify multiple contradictory values, then Oracle uses the last value. Regex tutorial — A quick cheatsheet by examples matches a string that ends with end ^The end$ exact string match of regex can be multiple and I’m sure that you’ve recognized at least

Special pattern matching character operators, Match any character (except newline) $ Match the end of the line (or before to match at only the beginning of the string, the $ character at only the end (or before The \A and \Z are just like ^ and $ except that they won't match multiple times  Lesson 10: Starting and ending So far, we've been writing regular expressions that partially match pieces across all the text. Sometimes this isn't desirable, imagine for example we wanted to match the word "success" in a log file.

Comments
  • The code you posted generates the same output as desired - anything wrong?
  • If domain1 and domain2 aren't actually domains, what should we actually be matching for?
  • You can simplify the regex using my.domain[12] instead of repeating the whole thing for domain2.
  • if those are just fake data for the example, you can do something like (firstdomain|second.dom) instead
  • This will work for the example, but may be difficult to generalise.
  • This may return false positives (i.e., links that are not subdomains of OP's example).
  • Didn't see the 2 domain requirement, bu it still looks pretty simple.. updated my answer.