I am seeking for a function or script that reads every line and seek for a duplicate or match in the same file to then delete it.

The uniq -u function does half the job. I was trying to use a while read with a grep. Some how it works but for some reason it's outputing twice the strings that it's looking for.

This is the code I have for now to identify the matching word. I did not come up yet with the code to then delete the matching word, but I would have used most likely sed

while read line; do
  grep "$line" $filename
done < $filename

and this is the file I have


The expected result should be:


But the result I have is this:


The outcome I would like for this script would be:



The previous example brought some confusion and I apologise for it.

This is other example:


So the FS would be /.

The lines 2 and 3 should be deleted, because they match line 1 fields

Line 5 should be deleted with line 7, because it matches the second field

The output I expect to have is:


Hope this clarifies better the issue.

Use awk instead.

BEGIN { FS = "/" } # / is field separator.
($NF in a) {       # if last field is in a;
  delete a[$NF]    # delete it,
} {                # otherwise;
  a[$NF] = $0      # add it to a.
} END {            # at the end;
  for (b in a)     # print everything in a.
    print a[b]


awk -F '/' '($NF in a){delete a[$NF];next} {a[$NF]=$0} END{for(b in a) print a[b]}' file

How to Remove Lines Containing a Word or String in a Text File , In other words, I need to remove lines containing the string "" in the To do the opposite of the above — i.e., delete lines that do not contain a word or phrase Click Replace All.Now, Notepad++ replaces all those matching lines with�

Try this:

nl -nrz -w6 -s " " "$filename" | sort -k2 | uniq -s7 | sort -n | cut -c8-

nl numbers the line (6 digits 1 space, max 1 mio lines). sort sorts beginning from part 2 (ignoring line number). unique removes duplicates ignoring 7 chars (the line numbers). sort sorts in original order. Finally cut removes the line numbers.

tried on gnu sed

sed -nE 'G; /^([[:print:]]+\n)(.+\n)*\1/b; h;P' filename

  • in your whileloop you are matching filename content against itself, probably it is not what you want to do.
  • Your question is not fully clear. Does your input file contain quotes (") or not? Do you want to have the output LeylaS/LS because there is no line LS? If there would be a line XY without a corresponding XabcY/XY would you want to have a line XY in the output? Please edit your question for clarification or to add more information.
  • So you want to find duplicates or delete them? Hiw do you define a &quot;word&quot;?
  • @Bodo Sorry for the confusion, no there are no quotes in the original file. I quoted the to show what grep was matching. As you said, the LeylaS/LS should be the output because there is no LS or LeylaS in the file. So if either LS or LeylaS match with LeylaS/LS both of those lines needs to be deleted.
  • @downtheroad Yeah... I am trying to grep a line in the [filename] and compare it with the whole file. Where it seems in this case I am comparing the lines from filename in the file.
  • This worked great for this example. If I want this to work with multiple fields what should I edit? Example: JonasB/JB/JonB JB AmeliaZ/AZ/AmeZ/ZA AmeliaZ AmeZ And this should delete all entries. Sorry if this goes over the main question.
  • @Terrenus actually it does, you should have mentioned about that in your question in first place
  • Thanks for the quick response, unfortunately when I tried to run the command, it started duplicating the file multiple times and did not cut the differences.