How to cut part of the text and replace each line with Python and RegEx

regex python
regex to extract words from string python
python regex match word in string
python regex cheat sheet
python split regex
python regex replace
python regex tester
multiline regex python

Hello, I'm a complete beginner with Python and just started learning it and using RegEx for text manipulation. I am sorry in advance if i had broken some rules of StackOverflow

I am making a script in Python where i would take (cut) date and time from first line and replace "Date" "TimeWindowStart" and TimeWindowEnd" on each line

ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000

I know how to select with regex date

([0-9][0-9]|2[0-9])/[0-9][0-9](/[0-9][0-9][0-9][0-9])?

And how to select time

([0-9][0-9]|2[0-9]):[0-9][0-9](:[0-9][0-9])?

But im stuck with how to select part of the text copy it and then find text which i want to replace with re.sub function

so final output would look like this:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

first thing, you can specify a quantifier in regex queries, so if you want 4 numbers you don't need [0-9][0-9][0-9][0-9] but you can do with [0-9]{4}. To capture an expression you wrap it in round brackets value=([0-9]{4}) will give you only the numbers

If you want to use re.sub you just need to give it a patter, a replacement string and your input string, e.g. re.sub(pattern, replacement, string)

Therefore:

import re

txt = """ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
"""

pattern_date = 'ReportDate=([0-9]{2}/[0-9]{2}/[0-9]{4})'
report_date = re.findall(pattern_date, txt)[0]

pattern_time_start = 'TimeWindowStart=([0-9]{2}:[0-9]{2}:[0-9]{2})'
start_time = re.findall(pattern_time_start, txt)[0]

pattern_time_end = 'TimeWindowEnd=([0-9]{2}:[0-9]{2}:[0-9]{2})'
end_time = re.findall(pattern_time_end, txt)[0]

splitted = txt.split('\n')  # Split the txt so that we skip the first line

txt2 = '\n'.join(splitted[1:])  # text to perform the sub 

# substitution of your values
txt2 = re.sub('Date', report_date, txt2)
txt2 = re.sub('TimeWindowStart', start_time, txt2)
txt2 = re.sub('TimeWindowEnd', end_time, txt2)

txt_final = splitted[0] + '\n' + txt2
print(txt_final)

Output:

ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

Using Regex for Text Manipulation in Python, For instance, you may want to remove all punctuation marks from text A Regular Expression is a text string that describes a search pattern which can be used to match or replace patterns inside a string with a minimal amount of code. In this� Python provides string methods that allows us to chop a string up according to delimiters that we can specify. In other words, we can tell Python to look for a certain substring within our target string, and split the target string up around that sub-string. It does that by returning a list of the resulting sub-strings (minus the delimiters).

This is a partial answer, because I don't know the Python APIs for manipulating text files particularly well. You may read the first line of the file, and extract out the values for the report date, and start/end window times.

first = "ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59"
ReportDate = re.sub(r'ReportDate=([^,]+),.*', '\\1', first)
TimeWindowStart = re.sub(r'.*TimeWindowStart=([^,]+),.*', '\\1', first)
TimeWindowEnd = re.sub(r'.*TimeWindowEnd=(.*)', '\\1', first)

Write out the first line with the values for the three variables removed.

Then, all you need to do is read in each subsequent line and do the following replacements:

line = "Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000"
line = re.sub(r'\bDate\b', ReportDate, line)
line = re.sub(r'\b TimeWindowStart\b', TimeWindowStart, line)
line = re.sub(r'\ TimeWindowEnd\b', TimeWindowEnd, line)

After processing each line in this way, you may write it to the output file.

String Manipulation and Regular Expressions, In addition, it is possible to define multi-line strings using a triple-quote syntax: For basic manipulation of strings, Python's built-in string methods can be extremely The replace() function returns a new string, and will replace all occurrences of the input: text = "To email Guido, try guido@python.org or the older address� Python – Split String by Space. You can split a string with space as delimiter in Python using String.split() method. In this tutorial, we will learn how to split a string by a space character, and whitespace characters in general, in Python using String.split() and re.split() methods.

Here is my code:

import re

s = """ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000"""

datereg = r'(\d{2}/\d{2}/\d{4})'
timereg = r'(\d{2}:\d{2}:\d{2})'

dates = re.findall(datereg, s)
times = re.findall(timereg, s)

# replacing one thing at a time
result = re.sub(r'\bDate\b', dates[0],
            re.sub(r'\bTimeWindowEnd\b,', times[1] + ',',
                re.sub(r'\bTimeWindowStart\b,', times[0] + ',',
                    re.sub(timereg, '', 
                        re.sub(datereg, '', s)))))

print(result)

Output:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

Regular Expression HOWTO — Python 3.3.7 documentation, If the regex pattern is a string, \w will match all the characters marked as part of Regular Expression Syntax in the Standard Library reference. For example, if you're processing French text, you'd want to be able This can be handled by writing a regular expression which matches an entire header line,� The list stores each line of our text as a string object. All string objects have a method, find(), which locates the first occurrence of a substrings in the string. Let's use the find() method to search for the letter "e" in the first line of our text file, which is stored in the list mylines.

Try this,

import re

#Open file and read line by line
with open("a") as file:
 # Get and process first line
 first_line = file.readline()
 m = re.search("ReportDate=(?P<ReportDate>[0-9/]+), TimeWindowStart=(?P<TimeWindowStart>[0-9:]+), TimeWindowEnd=(?P<TimeWindowEnd>[0-9:]+)",first_line)
 first_line= re.sub(m.group('ReportDate'), "", first_line)
 first_line= re.sub(m.group('TimeWindowStart'), "", first_line)
 first_line= re.sub(m.group('TimeWindowEnd'), "", first_line)
 print(first_line)

 # Process rest of the lines
 for line in file:
    line = re.sub(r'\bDate\b', m.group('ReportDate'), line)
    line = re.sub(r'\bTimeWindowStart\b', m.group('TimeWindowStart'), line)
    line = re.sub(r'\bTimeWindowEnd\b', m.group('TimeWindowEnd'), line)
    print(line.rstrip())

Output:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

Split strings in Python (delimiter, line break, regex, etc.), Split strings in Python (delimiter, line break, regex, etc.) Here's how to split strings by delimiters, line breaks, regular expressions, and the number Of course, it is possible to use special characters of regular expression for each ( replace, translate, re.sub, re.subn) � Sort a list, string, tuple in Python (sort,� To understand how this regular expression works in Python, we begin with a simple example of a split function. In the example, we have split each word using the "re.split" function and at the same time we have used expression \s that allows to parse each word in the string separately.

Find a clear solution represented below:

import re

input_str = """
ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
"""

# Divide input string into two parts: header, body
header = input_str.split('\n')[1]
body = '\n'.join(input_str.split('\n')[2:])

# Find elements to be replaced
ri = re.findall('\d{2}/\d{2}/\d{4}',header)
ri.extend(re.findall('\d{2}:\d{2}:\d{2}',header))

# Replace elements
new_header = header.replace(ri[0],'')\
                   .replace(ri[1],'')\
                   .replace(ri[2],'')

new_body = body.replace('Date',ri[0])\
               .replace('TimeWindowStart',ri[1])\
               .replace('TimeWindowEnd',ri[2])

# Construct the result string
full_string = new_header + '\n\n' + new_body

Just find the items to be replaced with regex and perform an ordinary string replace. I think it's effective until you have only few elements.

How can I extract a portion of a string variable using regular , At the bottom of the page is an explanation of all the regular expression In these situations, regular expressions can be used to identify cases in which a string even though it appears later on the line of syntax). regexr – used to replace a� Python string method replace() returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max. Syntax Following is the syntax for replace() method −

Find and replace text using regular expressions, Run, debug, test, and deploy � Analyze applications � Python � Web Development � Scientific tools Press Ctrl+R to open the search and replace pane. When you search for a text string that contains special regex symbols, PyCharm Note that the group 0 refers to the entire regular expression. Replace with regular expression: re.sub(), re.subn() If you use replace() or translate(), they will be replaced if they completely match the old string.. If you want to replace a string that matches a regular expression instead of perfect match, use the sub() of the re module.

Automate the Boring Stuff with Python, You can extract partial strings from string values, add or remove spacing, convert and a program to automate the boring chore of formatting pieces of text. the rest of the line, a multiline string is often used for comments that span multiple lines. You can think of the string 'Hello world!' as a list and each character in the� A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

substr_replace - Manual, substr_replace — Replace text within a portion of a string str_replace() - Replace all occurrences of the search string with the I wrote a function that you can use for example in combination with a search script to cut off the articles that are too I started with a regular expression solution, but found that I kept matching the� Open output file in write mode and handle it in text mode. For each line read from input file, replace the string and write to output file. Close both input and output files. Example 1: Replace string in File. In the following example, we will replace the string pyton with python in data.txt file, and write the result to out.txt. Python Program

Comments
  • Thank you @dzang, this helped me a lot!
  • @krcha glad to hear that. good luck with your task! If you are happy with one of the answers given, would be good to mark it as accepted answer. cheers