Find content of one file from another file in UNIX

grep find lines not in another file
awk
man grep
grep -f examples
grep options
unix command to search a file
bash find string in file
find string in files linux

I have 2 files. First file contains the list of row ID's of tuples of a table in the database. And second file contains SQL queries with these row ID's in "where" clause of the query.

For example:

File 1

1610657303
1610658464
1610659169
1610668135
1610668350
1610670407
1610671066

File 2

update TABLE_X set ATTRIBUTE_A=87 where ri=1610668350;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672154;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610668135;
update TABLE_X set ATTRIBUTE_A=87 where ri=1610672153;

I have to read File 1 and search in File 2 for all the SQL commands which matches the row ID's from File 1 and dump those SQL queries in a third file.

File 1 has 1,00,000 entries and File 2 contains 10 times the entries of File 1 i.e. 1,00,0000.

I used grep -f File_1 File_2 > File_3. But this is extremely slow and the rate is 1000 entries per hour.

Is there any faster way to do this?

One way with awk:

awk -v FS="[ =]" 'NR==FNR{rows[$1]++;next}(substr($NF,1,length($NF)-1) in rows)' File1 File2

This should be pretty quick. On my machine, it took under 2 seconds to create a lookup of 1 million entries and compare it against 3 million lines.

Machine Specs:

Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (8 cores)
98 GB RAM

Find pattern from one file listed in another, What version of grep are you using? I tried your code and got the following results: $ grep -f file1 file2 ENSG00000187546 ENSG00000113492  Copy one file contents to another file in Linux. The cp commands basic syntax is: cp file_name new_file_name. cp [options] file_name new_file_name. cp original_name new_name. Please note that when a copy is made of a file, the copy must have a different name than the original.

You don't need regexps, so grep -F -f file1 file2

grepping all lines of one file from another file, I need to grep the second file for all lines that contain one of the lines from file 1 ksh Script, Reading A File, Grepping A File Contents In Another File nbellow/' data.txt how can i search for patterns that are on different lines using simple  Top Forums Shell Programming and Scripting Using awk to read one file and search in another file #

I suggest using a programming language such as Perl, Ruby or Python.

In Ruby, a solution reading both files (f1 and f2) just once could be:

idxes = File.readlines('f1').map(&:chomp)

File.foreach('f2') do | line |
  next unless line =~ /where ri=(\d+);$/
  puts line if idxes.include? $1
end

or with Perl

open $file, '<', 'f1';
while (<$file>) { chomp; $idxs{$_} = 1; }
close($file);

open $file, '<', 'f2';
while (<$file>) {
    next unless $_ =~ /where ri=(\d+);$/;
    print $_ if $idxs{$1};
}
close $file;

How to copy one file contents to another file in Linux, Related Tutorials. Linux or UNIX securely copy files across a network computer · Linux Copy Command Examples  <patterns> is a file containing one pattern in each line; and <file> is the file in which you want to search things. Note that, to force grep to consider each line a pattern, even if the contents of each line look like a regular expression, you should use the flag -F, --fixed-strings .

The awk/grep solutions mentioned above were slow or memory hungry on my machine (file1 10^6 rows, file2 10^7 rows). So I came up with an SQL solution using sqlite3.

Turn file2 into a CSV-formatted file where the first field is the value after ri=

cat file2.txt  | gawk -F= '{ print $3","$0 }' | sed 's/;,/,/' > file2_with_ids.txt

Create two tables:

sqlite> CREATE TABLE file1(rowId char(10));
sqlite> CREATE TABLE file2(rowId char(10), statement varchar(200));

Import the row IDs from file1:

sqlite> .import file1.txt file1

Import the statements from file2, using the "prepared" version:

sqlite> .separator ,
sqlite> .import file2_with_ids.txt file2

Select all and ony the statements in table file2 with a matching rowId in table file1:

sqlite> SELECT statement FROM file2 WHERE file2.rowId IN (SELECT file1.rowId FROM file1);

File 3 can be easily created by redirecting output to a file before issuing the select statement:

sqlite> .output file3.txt

Test data:

sqlite> select count(*) from file1;
1000000
sqlite> select count(*) from file2;
10000000
sqlite> select * from file1 limit 4;
1610666927
1610661782
1610659837
1610664855
sqlite> select * from file2 limit 4;
1610665680|update TABLE_X set ATTRIBUTE_A=87 where ri=1610665680;
1610661907|update TABLE_X set ATTRIBUTE_A=87 where ri=1610661907;
1610659801|update TABLE_X set ATTRIBUTE_A=87 where ri=1610659801;
1610670610|update TABLE_X set ATTRIBUTE_A=87 where ri=1610670610;

Without creating any indices, the select statement took about 15 secs on an AMD A8 1.8HGz 64bit Ubuntu 12.04 machine.

How To Find Files by Content Under UNIX, -H Print the file name for each match. This is the default when there is more than one file to search. $grep --color=auto -nH 'DIR' * You may specify "or" with the -o flag and the use of grouped parentheses. To match all files modified more than 7 days ago and accessed more than 30 days ago, use: To match all files modified more than 7 days ago or accessed more than 30 days ago, use: You may specify "not" with an exclamation point.

Maybe try AWK and use number from file 1 as a key for example simple script

First script will produce awk script: awk -f script1.awk

 {
   print "\$0 ~ ",$0,"{ print \$0 }" > script2.awk;
 }

and then invoke script2.awk with file

Finding Things – The Unix Shell, Use find to find files and directories whose names match simple patterns. Use the output of one command as the command-line argument(s) to another command. For our examples, we will use a file that contains three haikus taken from a  After you’re comfortable with moving around the hierarchy of your hard drive in UNIX, it’s a cinch to copy, move, and rename files and folders. To copy files from the command line, use the cp command. Because using the cp command will copy a file from one place to another, it requires two operands: first the source and then the destination.

grep the contents of a file with another file, someone once told me that use can pass a file to grep and use that to search the contents of another file. if that is the case I'm not entirely sure. The find command will begin looking in the /dir/to/search/ and proceed to search through all accessible subdirectories. The filename is usually specified by the -name option. You can use other matching criteria too: -name file-name – Search for given file-name. You can use pattern such as *.c. -iname file-name – Like -name, but the match is

Finding Text Within Files, -r or rgrep --- search for text within files recursively. This command uses Or you could use it like this, to search through the output of another file: rpm -qa | grep  To find files by modification time use the -mtime option followed by the number of days to look for. The number can be a positive or negative value. A negative value equates to less then so -1 will find files modified within the last day. Similarly +1 will find files modified more than one day ago. find ./foo -mtime -1 find ./foo -mtime +1

Grep Command in Linux (Find Text in Files), Grep searches one or more input files for lines that match a given pattern from the standard input, which is usually the output of another command. where you want to check whether a file contains a string and perform a 

Comments
  • Do all queries which use the row IDs mentioned in file 1 end up in the same output file, or do you want to have a separate file for each row ID in file 1?
  • Are you allowed to sort the files on the IDs? Or does the order have to be preserved?
  • Just one output file. Each time an entry from File 1 matches an entry in File 2, that entry of File 2 should be dumped in an output file. And the entries in File 1 and File 2 are unique. No duplicate lines in any of the files.
  • @fizzer Order doesn't matter. We can sort it.
  • You probably mean 1,000,000 and 10,000,000 right ? :-)
  • This one-liner is brilliant :)
  • If you're going to specify the time it takes to run the command, might be relevant to include the machine specs.
  • Your machine's a beast. Btw if you add ; to FS i.e. FS="[ =;]", you can simply use $(NF-1) to get the numbers w/o substr and co. Not sure how it'd impact performance though.
  • @doubleDown It's not my personal machine :) It's my test box at workplace. True we can do that too .. just didn't feel like cramming too many Field separators.
  • @JS웃 This one is awsome. Total time consumed on my machine was 46 minutes. Machine Specs: SunOS 5.10 Generic_127111-03 sun4v sparc SUNW,SPARC-Enterprise-T5120 Thanks a lot :)
  • The OP says he's already tried this one, but it was too slow.
  • but without -F attribute