Delete every line if occurence found

sed delete line
vim search and delete line
sed delete multiple lines matching pattern
sed delete all lines after match
vim delete lines not matching
vim delete pattern
unix command to delete multiple lines in a file
sed find and delete lines

I have a file with this format content:

1  6  8
1  6  9
1  12 20
1  6
2  8
2  9
2  12
2  20
2  35

I want to delete all the lines if the number (from 2nd or 3rd column but not from 1st) is found in the next lines whether it is in the 2nd or 3rd column inluding the line where the initial number is found.

I should have this as an output:

2 35

I've tried using:

awk '{for(i=2;i<=NF;i++){if($i in a){next};a[$i]}} 1' 

but it doesn't seem to work.

What is wrong ?

One-pass awk that hashes all the records to r[NR] and keeps another array a[$i] for the values seen in fields $2,...NF.

awk ' {
    for(i=2;i<=NF;i++)       # iterate fields starting from the second
        if($i in a) {        # if field value was seen before
            delete r[a[$i]]  # delete related record
            a[$i]=""         # clear a
            f=1              # flag up
        } else {             # if it was not seen before
            a[$i]=NR         # add record number to a
            r[NR]=$0
        }
    if(f!=1)                 # if flag was not raised
        r[NR]=$0             # store record on record number
    else                     # if it was raised
        f=""                 # flag down
}
END {
    for(i=1;i<=NR;++i)
        if(i in r)
            print r[i]       # output remaining
}' file

Output:

2  35

Delete all lines before first occurrence of specific string in file, With sed you could use: sed -i '/somestring/,$!d' file. Explanation of replace expressions: , matches lines starting from where the first address  Delete every line if occurence found. I have a file with this format content: I want to delete all the lines if the number (from 2nd or 3rd column but not from 1st) is found in the next lines whether it is in the 2nd or 3rd column inluding the line where the initial number is found.

The simplest way is a double-pass algorithm where you read your file twice.

The idea is to store all values in an array a and count how many times they appear. If the value appears 2 or more times, it means you have found more then a single entry and you should not print the line.

awk '(NR==FNR){a[$2]++; if(NF>2) a[$3]++; next} 
     (NF==2) && (a[$2]==1);
     (NF==3) && (a[$2]==1 && a[$3]==1)' <file> <file>

In practice, you should avoid things such as a[var]==1 if you are not sure whether var is in the array as it will create that array element. However, since we never increase it any more, it is fine to proceed.

If you want to achieve the same thing with more then three fields you can do:

awk '(NR==FNR){for(i=2;i<=NF;++i) a[$i]++; next }
     {for(i=2;i<=NF;++i) if(a[$i]>1) next }
     {print}' <file> <file>

While both these solutions read the file twice, you can also store the full file in memory and read the file only a single time. This, however, is exactly the same algorithm:

awk '{for(i=2;i<=NF;++i) a[$i]++; b[NR]=$0}
     END{ for(j=1;j<=NR;++j) {
            $0=b[j];
            for(i=2;i<=NF;++i) if(a[$i]>1) continue
            print $0
          }
         }' <file>

comment: this single-pass solution is very simple and stores the full file in memory. The solution of James Brown is very clever. It removes stuff from memory when they are not needed anymore. A bit shorter version is:

awk '{ for(i=2;i<=NF;++i) if ($i in a) delete b[a[$i]]; else { a[$i]=NR; b[NR]=$0 }}
     END { for(n=1;n<=NR;++n) if(n in b) print b[n] }' <file>

note: you should never thrive for the shortest solution, but the most readable one!

Delete lines with matching pattern after the first line with sed, I can delete every occurrence with sed '/^country\t/d' in.txt > out.txt , but I would like to keep the first occurrence on the first line. The back story is that I have a  Delete all lines containing a pattern. The ex command g is very useful for acting on lines that match a pattern. You can use it with the d command, to delete all lines that contain a particular pattern, or all lines that do not contain a pattern.

Could you please try following.

awk '
FNR==NR{
  for(i=2;i<=NF;i++){
    a[$i]++
  }
  next
}
(NF==2 && a[$2]==1) || (NF==3 && a[$2]==1 && a[$3]==1)
'  Input_file  Input_file

Output will be as follows.

2  35

sed: delete all occurrences of a string except the first one, Now I would like to remove all of the timestamps from a line but keep the first one. I can do s/pattern//2 but that only removes the second occurrence and sed doesn​  How do I remove every occurence of duplicate line in a file, except the first occurence, without changing lines order? ( self.linuxadmin ) submitted 3 years ago by [deleted]

$ cat tst.awk
NR==FNR {
    cnt[$2]++
    cnt[$3]++
    next
}
cnt[$2]<2 && cnt[$NF]<2

$ awk -f tst.awk file file
2  35

Remove all the lines before the first line that contains a match , They both just set a found flag when seeing the first regex match and print when that flag is set. echo "lost load linux loan linux" | awk 'BEGIN {found = 0} {if  You simply remove every whitespace character (tabs, spaces, whatever) between the line start and the word "server" (only if there is nothing else but whitespaces) using the following simple command: sed -i 's/^\s*server/server/' FILENAME

This might work for you (GNU sed):

sed -r 'H;s/^[0-9]+ +//;G;s/\n(.*\n)/\1/;h;$!d;s/^([^\n]*)\n(.*)/\2\n  \1/;:a;/^[0-9]+ +([0-9]+)\n(.*\n)*[^\n]*\1[^\n]*\1[^\n]*$/bb;/^[0-9]+ +[0-9]+ +([0-9]+)\n(.*\n)*[^\n]*\1[^\n]*\1[^\n]*$/bb;/\n/P;:b;s/^[^\n]*\n//;ta;d' file

This is not a serious solution however it demonstrates what can be achieved using only matching and substitution.

The solution makes a copy of the original file and whilst doing so accumulates all numbers in the second and possible third fields of each record in a separate line which it maintains at the head of the copy.

At the end of the file, the first line of the copy contains all the pertinent keys and if there are duplicate keys then any line in the file that contains such a key is deleted. This is achieved by moving the keys (the first line) to the end of the file and matching the second (and possibly third) fields of each record on those keys.

A vim pattern search and delete example, Every once in a while when I'm using the vi editor (or vim) I find myself in a situation where I need to delete a bunch of lines in the file that match  If you have worked with MS Excel for a while I bet you have had the need to highlight/count unique or duplicate values. In this post I’ll cover all you need to know to deal with duplicate/unique values in MS Excel. After reading this post you’ll be able to: Finding duplicates in existing data is one of the important features of MS Excel.

Removing all lines containing a string in vi, Commands to remove any lines containing a string/pattern using the The asterisk represents a quantifier indicating zero or more occurrences 

50 `sed` Command Examples – Linux Hint, Find case insensitive match and delete line; Find case insensitive match and replace with new text; Find case insensitive match and replace with all uppercase of 

Delete Lines Matching Specific Pattern in a File using VIM, This will delete all lines containing the specified keywords. To delete empty lines or lines with white spaces ( s ); :g/^\s*$/d or :g/^$/ 

Comments
  • OP should explain "found in the next lines". does it mean "following lines"? if true, only first 3 lines in your example should be removed.
  • What is your example not working, what is the output you get.
  • For example: the first line contains 6 and 8 and these numbers are also found in 2nd line, 4th and 5th. Thus, lines number 1, 2, 4 and 5 should be removed etc. In this case only the last line should remain i.e. (2,35)
  • Out of curiosity, if your data is big enough and you test all of these solutions, please let me know if one or two-pass solution was faster.
  • Very nice solution. I like the clearing of the buffer to reduce memory. However, take into account that for(i in r) will iterate in an unspecified order, so you might not keep the order intact. You might want to write for(i=1;i<=NR;++i) if(i in r) print r[i]
  • I'm a bit puzzled about the flag. what is its use? It looks like the else statement of the if($i in a) is already doing all the work. Or am I missing something?
  • Maybe I'm doing something wrong but it gave me nothing as output.
  • @inourss, First thing I am reading Input_file 2 times so make sure you are copying code correctly then if your Input_file is same as shown sample + 35 is not coming in your Input_file more than once then this should work, if this is not the case then check if your Input_file is having control M characters in them by doing cat -v Input_file and let me know then??
  • @RavinderSingh13 It worked for me. I forgot to put the input_file twice.
  • If a[$2]==1 it will print the line and will never test if a[$3]==1. You should, in theory, have an && instead of || but the fact that the amount of fields varies, this will fail.
  • @kvantour, sure, changed the code now, thanks for letting me know.
  • This is so clever as it uses two nifty ideas! 1. Always add the third field, no matter what. So if there are only 2 fields, increase the counter of the empty field! 2. use cnt[$NF] instead of cnt[$3] this ensures that if you only have 2 fields, you just test field $2 twice! (If I could, I vote this 10 times)
  • Thanks. IMHO NF is an under-used resource. So many problems can be solved easily by using NF or $NF instead of some other approach, e.g. stackoverflow.com/a/52089084/1745001.