Regex to extract numbers and trailing letter or white space

regex whitespace
regex find space between words
regex tutorial
regex single space
regex whitespace javascript
regex one or more spaces
python regex

I'm currently trying to extract data from strings that are always in the same format (scraped from social sites with no API support)

example of strings

53.2k Followers, 11 Following, 1,396 Posts
5m Followers, 83 Following, 1.1m Posts

I'm currently using the following regex expression: "[0-9]{1,5}([,.][0-9]{1,4})?" to get the numeric sections, preserving the comma and dot separators.

It yields results like

53.2, 11, 1,396 
5, 83, 1.1

I really need a regular expression that will also grab the character after the numeric sections, even if it's a white-space. i.e.

53.2k, 11 , 1,396
5m, 83 , 1.1m

Any help is greatly appreciated

R code for reproduction


  string1 <- ("536.2k Followers, 83 Following, 1,396 Posts")
  string2 <- ("5m Followers, 83 Following, 1.1m Posts")

  info <- str_extract_all(string1,"[0-9]{1,5}([,.][0-9]{1,4})?")
  info2 <- str_extract_all(string2,"[0-9]{1,5}([,.][0-9]{1,4})?")


I would suggest the following regex pattern:


This pattern generates the outputs you expect. Here is an explanation:

[0-9]{1,3}      match 1 to 3 initial digits
(?:,[0-9]{3})*  followed by zero or more optional thousands groups
(?:\\.[0-9]+)?  followed by an optional decimal component
[A-Za-z]*       followed by an optional text unit

I tend to lean towards base R solutions whenever possible, and here is one using gregexpr and regmatches:

txt <- "53.2k Followers, 11 Following, 1,396 Posts"
m <- gregexpr("[0-9]{1,3}(?:,[0-9]{3})*(?:\\.[0-9]+)?[A-Za-z]*", txt)
regmatches(txt, m)

[1] "53.2k"   "11"   "1,396"

RegexOne - Lesson 9: All this whitespace, The most common forms of whitespace you will use with regular expressions are In addition, a whitespace special character \s will match any of the specific� Solution: We have to match only the lines that have a space between the list number and 'abc'. We can do that by using the expression \d\.\s+abc to match the number, the actual period (which must be escaped), one or more whitespace characters then the text.

We can add an optional character argument in the regex

#[1] "536.2k" "83"     "1,396" 
#[1] "5m"   "83"   "1.1m"

Regular Expression Examples, RegexBuddy offers the fastest way to get up to speed with regular Search for [ \t ]+$ to trim trailing whitespace. Since regular expressions work with text rather than numbers, matching specific Mixing Unicode and 8-bit Character Codes. Extract a letter from a list of numbers. 3891 views 43 minutes ago python 0 . SN. Sto Ned 43 minutes ago I have this problem: number = a1234 alphabet=

Another stringr option:

strsplit(new_s," , ")

    #[1] "5m"    "83"    "1.1m "


#[1] "83 "  "1.1m"
#[1] "536.2k" "83 "    "1,396" 

Symbols and White space optional (phone numbers), Regular Expression to Phone number expression with symbols and white spaces optional in between digit sequences. Regular Expression to . Character classes. any character except newline \w \d \s: word, digit, whitespace

If you also want to grap the character after the numeric section even if it is a space, you could use your pattern and an optional character class [mk ]? including the space:

[0-9]{1,5}(?:[,.][0-9]{1,4})?[mk ]?

Regex demo | R demo

You might expand the range of characters in the the character class to match [a-zA-Z ]? instead. If you want to use a quantifier to match either 1+ times a char OR a single space you could use an alternation:

[0-9]{1,5}(?:[,.][0-9]{1,4})?(?:[a-zA-Z]+| )?

Everything you need to know about Regular Expressions, Sometimes you have to match characters that are best expressed by using their Say we want to match the symbol for natural numbers: ℕ - U+2115 The upper case \Z variant is tolerant of trailing newlines and matches just before It matches every position between characters within white space and� The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character). The regex \bcat\b would therefore match cat in a black cat, but it

(Updated my earlier post that selected extraneous commas/space) This works to meet the OP's requirement to extract trailing letter or white space after the numeric sections (without the extraneous commas and white_spaces of my previous version):

(?:[\d]+[.,]?(?=\d*)[\d]*[km ]?)

previous version: \b(?:[\d.,]+[km\s]?)

- (?:          indicates non-capturing group
- [\d]+        matches 1 or more digits
- [.,]?(?=\d*) matches 0 or 1 decimal_point or comma that is immediately followed ("Positive Lookahead") by 1 or more digits
- [\d]*        matches 0 or more digits
- [km\s]?      matches 0 or 1 of characters within []
53.2k Followers, 11 Following, 1,396 Posts     
5m Followers, 83 Following, 1.1m Posts  
# 53.2k; 11 ; 1,396
# 5m; 83 ; 1.1m  

note the spaces matched after 11 and 83, as intended by OP.

[PDF] 265-29: An Introduction to Perl Regular Expressions , and numbers all jumbled up in a data file and you want to extract all of the numbers on each line that contains numbers. Matches a white space character, including a space Look at the last few examples below to see how other delimiters. A simple cheatsheet by examples. UPDATE! Check out my new REGEX COOKBOOK about the most commonly used (and most wanted) regex 🎉. Regular expressions (regex or regexp) are extremely useful in

Python Remove Spaces from String, Python Numbers � 31. This string has different types of whitespaces as well as newline characters. Let's have a look at different functions to remove spaces. strip (). Python String strip() function will remove leading and trailing whitespaces. > We can also use a regular expression to match whitespace and remove them� regex documentation: Matching leading/trailing whitespace. Example Trailing spaces \s*$: This will match any (*) whitespace (\s) at the end ($) of the text Leading spaces ^\s*: This will match any (*) whitespace (\s) at the beginning (^) of the text

5.12. Trim Leading and Trailing Whitespace, Trim Leading and Trailing Whitespace Problem You want to remove leading and Selection from Regular Expressions Cookbook, 2nd Edition [Book] parts each: the shorthand character class to match any whitespace character (‹ \s ›), a� Regex Accelerated Course and Cheat Sheet For easy navigation, here are some jumping points to various sections of the page: Characters Quantifiers More Characters Logic More White-Space More Quantifiers Character Classes Anchors and Boundaries POSIX Classes Inline Modifiers Lookarounds Character Class Operations

Using Regex for Text Manipulation in Python, Similarly, you may want to extract numbers from a text string. To test this, update the value of text variable with an empty string: After the word The there is a space, which is not treated as an alphabet letter, therefore the For instance, in the output of the last example, there are multiple spaces between in and year . The anchors go outside of the brackets. Putting them inside makes them additional valid characters rather than anchors. Within the brackets you just add the ranges/characters you want, keeping in mind the comments above.