retain rows only when column has certain values in a file

awk filter rows by column value
grep line column equals certain value
awk filter multiple columns
grep specific column in file
awk remove lines based on column value
unix command to extract specific column from a file
awk print line if column matches
awk filter regex

I have the a file where I need to retain rows if first column value is equal to brand,city and zipcode So for this file

    product,0 0,no way
    brand,0 0 0,detergent
    product,0 0 1,sugar
    negative,0 0 1, sight
    city,0 0 2,grind
    zipcode,0 0 1,five

I will need this output

brand,0 0 0,detergent
city,0 0 2,grind
zipcode,0 0 1,five

What is the efficient way to accomplish this if the number of retained values increase say from 3 here to 20-30? Can we use a file values.txt that has the values we need to retain

   brand
   city
   zipcode

which can be used?

awk to the rescue!

$ awk 'NR==FNR{v[$1]; next} $1 in v' values.txt FS=, datafile

I have a data frame with an ID column and a few columns for values. I would like to only keep certain rows of the data frame based on whether  We have already discussed earlier how to drop rows or columns based on their labels. However, in this post we are going to discuss several approaches on how to drop rows from the dataframe based on certain condition applied on a column. Retain all those rows for which the applied condition on the given column evaluates to True.

Keep lines where the first record matches brand, city or zipcode:

awk

awk -F, '$1~/^(brand|city|zipcode)$/' file

sed

sed -r '/^(brand|city|zipcode),/!d' file

awk reading definitions from file

awk -F, 'a[$1];FNR==NR{a[$1]=1}' values.txt file

This requires unique values in values.txt.

I would like to grep only the rows that have in the first column the decimal .000 and Under some circumstances you might need to use [:digit:] in place of [0-9] . I saved the first output you provide to a text file called, "file.txt" and then used the  4. Click OK. Then only the rows which contain the text string you specified are displayed. Show rows contains a specific string by Kutools for Excel. If you usually use the same filter criteria across workbooks, you can apply Kutools for Excel ’s Super Filter function, it can save the filter settings for reusing easily with only several

Grep is a lot easier for this application. Just create a file named "words" containing each word you're interested in on a separate line. Then:

cat words | while read interesting_word; do grep "^$interesting_word" myfile; done

The file that you run the script on has DOS line-endings. It may be that it was created on a Windows machine. Use dos2unix to convert it to a Unix text file. Extract all rows from a range that meet criteria in one column [Excel defined Table] The image above shows a dataset converted to an Excel defined Table, a number filter has been applied to the third column in the table. Here are the instructions to create an Excel Table and filter values in column 3. Select a cell in the dataset. Press CTRL + T

She sent me the below sample file and how she wanted to filter it. how many columns you have, you can find this out with a bit of AWK: Now we've selected a couple of columns to print out, let's use AWK to Our initial problem requires that we look into the Chr field to get only lines with the value 6. In Excel, the Filter function can quickly help you to filter the rows that do not contain certain text, and then you can select them to delete. 1. Select the column which contains texts you will remove rows based on, and click Data > Filter.

Sometimes a file contains more columns than you need to retain. In this case, you can use Python to select only the columns that you need. Base Python One way to select specific columns in a CSV file is to use the index values of the you could use row[0] and row[-1] to write the first and last values in each row to a file. In [27]: df.dropna() #drop all rows that have any NaN values Out[27]: 0 1 2 1 2.677677 -1.466923 -0.750366 5 -1.250970 0.030561 -2.678622 7 0.049896 -0.308003 0.823295. In [28]: df.dropna(how='all') #drop only if ALL columns are NaN Out[28]: 0 1 2 1 2.677677 -1.466923 -0.750366 2 NaN 0.798002 -0.906038 3 0.672201 0.964789 NaN 4 NaN NaN 0.050742

Although the result does contain more than just the non-NA values, only the that the result of na.omit contains more information than just the non-NA values. to text file (“file name.txt”), retaining the column names, not retaining row names,  Click on the arrow next to the column that you want to filter. Excel will give you some options including all the values in the in the column you selected. In my case, I didn’t wanted to filter by a specific value but by all values in the column that are related to the words “press release”. It might be in the beginning, middle or end of

Comments
  • Do the files have the leading spaces that you are showing?
  • Possible duplicate of Print lines in one file matching patterns in another file
  • Does not seem to work for me. Gives me an empty file.
  • do the records have the leading space as you posted? This script assumes they don't. Also fixed a typo, should check for the array v not a.
  • No leading space.
  • That last one is fragile as it'll behave undesirably (by a reasonable expectation of requirements) if values.txt contains the same value twice. The 1st and 2nd ones will fail if file starts with intercity or other partial matches are present.
  • @EdMorton Yes, I indeed thought of that when I wrote it. But since you see it immediately (the output starts with lines from values.txt - and they are in a different format). I definitely agree on the other two and made the changes. Thanks!
  • Not sure we're talking about the same thing with that awk script. Regardless of order and format, if values.txt contains x twice then x will be printed when the 2nd x-line is read - that's my concern. You could move the , outside of the parens in the sed script btw.
  • @EdMorton Yes, that's what I meant. The output will contain x (and all the other duplicates or triplicates) and then the script will print the matching lines from file, like product,0 0,no way.
  • @EdMorton Yep, I've moved them outside, now that I have my parens ;-)