Extract date from strings that contains names+dates

Related searches

I need to extract the dates from a series of strings like this:

'MIHAI MĂD2Ă3.07.1958'

or

'CLAUDIU-MIHAI17.12.1999'

How to do this?

Tried this:

for index,row in DF.iterrows():
    try:
        if math.isnan(row['Data_Nasterii']):
            match = re.search(r'\d{2}.\d{2}.\d{4}', row['Prenume'])
            date = datetime.strptime(match.group(), '%d.%m.%Y').date()
            s = datetime.strftime(datetime.strptime(str(date), '%Y-%m-%d'), '%d-%m-%Y')
            row['Data_Nasterii'] = s
    except TypeError:
        pass

The . (dot) in regex doesn't mean the character dot, it means "anything" and needs to be escaped (\) to be an actual dot. other than that your first group is \d{2} but some of your dates have a single digit day. I would use the following:

re.search(r'(\d+\.\d+\.\d+)', row['Prenume'])

which means at least one number followed by a dot followed by at least one number..... if you have some mixed characters in your day you can try the following (sub par) solution:

''.join(re.search(r'(\d*)(?:[^0-9\.]*)(\d*\.\d+\.\d+)', row['Prenume']).groups())

this will filter out up to one block in your "day", its not pretty but it works(and returns a string)

How to Use Excel to Extract Date from String Muddling Text and Dates, How to use Excel for extracting a date from unstructured strings of text that are a muddle of Text is identified as containing a date when there's a character sequence of ? This is the function applied to the list of text strings:. 1. One column is there in which there are over 20,000 text strings which contain random data. Strings can be like: "$300 and above Weather Stations" OR like "Kestrel 5500 Weather Meter - IC-0855". 2. Another column contains 10,000 unique product codes.Codes can be like: "0810-0004" or "IC-0855" 3.

You can use the str accessor along with a regex:

DF['Prenume'].str.extract(r'\d{1,2}\.\d{2}\.\d{4}')

How to Extract Dates from Text Strings in , How to use Access VBA to extract a date from unstructured strings of text these conditions for deciding that a text string does contain a date:. If the text string contains more than one date, then the function returns the first of them. These are circumstances that I can think of in which the code should not be allowed to extract a date: Strings that contain a profusion of forward slashes, for example P/BS/G 1/12/1/1914/12/2018 .

You need to escape the dot (.) as \. or you can use it inside a character class - "[.]". It is a meta character in regex, which matches any character. If you need to validate more you can refer this!

eg: r'[0-9]{2}[.][0-9]{2}[.][0-9]{4}' or r'\d{2}\.\d{2}\.\d{4}'

text = 'CLAUDIU-MIHAI17.12.1999'
pattern = r'\d{2}\.\d{2}\.\d{4}'

if re.search(pattern, text):
    print("yes")

2 packages for extracting dates from a string of text in Python, This post will cover two different ways to extract a date from a string of text in Now results contains a list of the datetimes that appear in the� pandas.Series.str.extract¶ Series.str.extract (* args, ** kwargs) [source] ¶ Extract capture groups in the regex pat as columns in a DataFrame.. For each subject string in the Series, extract groups from the first match of regular expression pat.

Another good solution could be using dateutil.parser:

import pandas as pd
import dateutil.parser as dparser

df = pd.DataFrame({'A': ['MIHAI MĂD2Ă3.07.1958',
                         'CLAUDIU-MIHAI17.12.1999']})

df['userdate'] = df['A'].apply(lambda x: dparser.parse(x.encode('ascii',errors='ignore'),fuzzy=True))

output

                       A    userdate
0   MIHAI MĂD2Ă3.07.1958    1958-07-23
1   CLAUDIU-MIHAI17.12.1999 1999-12-17

Solved: Extract date from text, Solved: I have a data set contains a text field that has a date provide me solution that how I can extract date from the lines of string statements. last question @poojamate92 - @SeanAdams once gave me a superb list of� A Python program can read a text file using the built-in open() function. For example, the Python 3 program below opens lorem.txt for reading in text mode, reads the contents into a string variable named contents, closes the file, and prints the data.

7 Strings and Dates, Some of this clunkyness with strings and dates has been improved through the but the representation is stored in a nine-element list that includes the year, This representation makes it easy to extract date parts, such as the month or hour . To extract a substring from the middle of a text string, you need to identify the position of the marker right before and after the substring. For example, in the example below, to get the domain name without the .com part, the marker would be @ (which is right before the domain name) and .

Sometimes a vector strings have patterns and sometimes we need to make patterns from a vector of strings based on the characters. For example, we might want to extract the states name of United States of America from a vector that contains all the names. This can be done by using grepl function

Assuming that you have a list of data in range A1:C6, in which contain price data for each product. And you want to extract a unique product list from this range and return non-duplicates that are unique based on the month value. For example, you want to extract a product name list based on Jan month.

Comments
  • what have you tried? have you tried regular expressions?
  • @Nullman see my edited question
  • . doesnt mean the cahracter dot, it means any character and needs to be escaped. try this: r'\d+\.\d+\.\d+'
  • Then, you can check my solution.
  • @Nullman Thank you so much! Only one question. For this example : 'MIHAI MĂD2Ă3.07.1958' it takes '3.07.1958' BUT it should be '23.07.1958'. The '2' digit is inside the name
  • thank you! Can I apply this on a single value not on a column?
  • Of course. s1 = 'asd 03.12.1999', then print(dparser.parse(s1,fuzzy=True)) and you get 1999-03-12 00:00:00.