How to remove lines from a text file based the values in a list?

bash remove lines from file containing
grep remove lines containing string
grep remove matching text
sed delete lines with word
linux remove lines containing text
sed remove any lines containing
sed delete line after match
python remove line containing string

I have a very large text file(coverage.txt) >2G, and it looks like this:

#RefName    Pos Coverage
BGC0000001_59320bp  0   0
BGC0000001_59320bp  1   0
BGC0000002_59320bp  2   0
BGC0000002_59320bp  3   0
BGC0000002_59320bp  4   0
BGC0000003_59320bp  5   0
BGC0000003_59320bp  6   0
BGC0000003_59320bp  7   0
BGC0000004_59320bp  8   0
BGC0000004_59320bp  7   0
BGC0000004_59320bp  8   0
BGC0000005_59320bp  7   0
BGC0000005_59320bp  8   0
BGC0000005_59320bp  7   0
BGC0000006_59320bp  8   0
BGC0000006_59320bp  7   0
BGC0000006_59320bp  8   0
BGC0000007_59320bp  7   0
BGC0000007_59320bp  8   0
BGC0000007_59320bp  7   0
BGC0000008_59320bp  8   0
BGC0000008_59320bp  7   0
BGC0000008_59320bp  8   0
BGC0000009_59320bp  7   0
BGC0000009_59320bp  8   0

I have another text file(rmList.txt) like this:

BGC0000002
BGC0000004
BGC0000006
BGC0000008

I want to remove those lines from my coverage.txt file if the lines contain the IDs in the rmList.txt.

Here's what I tried:

wanted = [line.strip() for line in open('rmList.txt')]
files = 'coverage.txt'

def rmUnwanted(file):
    with open(file) as f, open('out.txt', 'w') as s:
        for line in f:
            pos = line.split()[0].split('_')[0]
            if pos not in wanted:
                s.write(line)

rmUnwanted(files)

But this takes forever for my large files. Is there a better way to do this? Is there anything wrong with my code?

Thank you so much!

It seems to me that the code is not wrong, it does what you want. But with large files it will require time. You may still work on efficiency.

If you are sure that both your files are already sorted (as it seems from your example), this code should be faster:

def rmUnwanted(file):
    with open(file) as f, open('out.txt', 'w') as s:
        i = 0
        lastwanted = ""
        for line in f:
            pos = line.split()[0].split('_')[0]
            try:
                if pos not in [wanted[i], lastwanted]:
                    s.write(line)
                else:
                    if pos == wanted[i]:
                        lastwanted = wanted[i]
                        i = i+1
            except IndexError:
                s.write(line)

It gives the same result using the example files you provided, but is faster (I did not measure it, but shoud be). What I do here is to avoid to look for pos in the whole wanted list at each iteration, which is time consuming if also your real rmList.txt is large.

Unix Sed Command to Delete Lines in File, How do you remove lines from a text file in Python? Method 1: Remove lines using Bookmark feature in Notepad++. Open the text-based file using Notepad++; Press Ctrl + F to open the Find and Replace dialog. Click to select the Mark tab. Type the search word or phrase in the “Find what” text box. In this example, I’d be typing .msn.com; Enable the Bookmark line checkbox. Set Search Mode to Normal.

use set instead of list to check duplicate elements.

wanted = { line.strip() for line in open('rmList.txt') }

....

How to Delete a File in Java, How do you remove the last line of a file in Python? This script removes lines from text file containing specified string value. The strings are "AddInMenus=NCMenu" and "EXTMGR_ADDINS=NCExtMgr,extpwd". If either are the string values found the lines are removed from the text file. The script is given below:Const ForReading = 1Cons

You can do it as follows:

with open("rmLst.txt") as f:
    rmLst = set(f.readlines())

with open("out.txt", "w") as outf, open("coverage.txt") as inf:
    # write header
    outf.write(next(inf))
    # write lines that do not start with a banned ID
    outf.writelines(line for line in inf if line[:line.index("_")] not in rmList)

First, you store all IDs to remove in a set for fast lookup. Then, iterate over lines and check if each line starts with a bad ID. Note that instead of running line.split() we can check access the ID portion of each line with line[:line.index['_']]. This avoids creating a copy of each line and should be faster than split. If all IDs have constant length, you can replace line.index['_'] with a number.

How to remove the lines which appear on file B from another file A , remove-lines lines-to-remove remove-from-this-file note that this behaves like a set, where there won't be any duplicate values (keys) with a slight change it can clean multiple lists and create cleaned versions. It takes two files as input and produces three text columns as output: lines only in the first  You can remove line breaks from blocks of text but preserve paragraph breaks with this tool.. If you've ever received text that was formatted in a skinny column with broken line breaks at the end of each line, like text from an email or copy and pasted text from a PDF column with spacing, word wrap, or line break problems then this tool is pretty darn handy.

Python: How to delete specific lines in a file in a memory-efficient , Delete a line from a file by specific line number in python Accept original filename and list of line numbers as argument; Open original file in  What can you do with Remove Line Containing? This will remove line Containing from provided text or input string, however keep the breaks of the Paragraph. It helps to remove unnecessary text lines. How to remove line Containing? Example. Here is a Text: my first line my second line my last line . String to remove : second. Output : my first line

How to remove lines from the text file containing specific words , grep approach. To create a copy of the file without lines matching "cat" or "rat", one can use grep in reverse ( -v ) and with the whole-word  For text files, the file object iterates one line of text at a time. It considers one line of text a "unit" of data, so we can use a forin loop statement to iterate on the data one line at a time:

Remove specific lines from a text online, This is useful, for example, to remove duplicates from a list of numbers. Remove specific lines from a text. Text to be edited: Remove the  Sed Command to Delete Lines - Based on Position in File In the following examples, the sed command removes the lines in file that are in a particular position in a file. 1. Delete first line or header line The d option in sed command is used to delete a line. The syntax for deleting a line is: > sed 'Nd' file Here N indicates Nth line in a file

Comments
  • It's is aswell, but the syntax is different. It's in the form {key: value for x in y}.