How to find which field in my CSV file appears the most using a shell script?

I'm learning how to use Shell script and I have a CSV file containing 5 columns name forname telephone room email and I want to find which roomcontains the most persons.

For the moment I did the following code and I'm stuck at the part where I need to count which room has the more employee or which room appear the most in the file

input="x.csv"
while read line; do
    room=$(echo $line | cut -d \; -f 4)
    if [ -n "$room" ]; then

    fi
done < ${input}

Counting the occurances of unique values is probably done best by using uniq -c. So to count the entries for each room individually, you need to extract a list which contains the column room. awk is probably the best tool in the bash environment to do that. For example:

#!/bin/bash
input="x.csv"

awk '{print $4}' $input | sort | uniq -c

this will return a list with two columns. The first column contains the number of occurences of the respective value in column two, such as:

      4 room1b
      2 room1
      1 room2
      1 room3

For more complex analysis follow Corentin's lead to extend awk's input.

Is there a better way for me to print this information: Where for each file I can see which other files have the field? My best crack at this is writing a shell script that does the following in order: Get the first line of each file and store in a map in an array; For each word in each line, find where in array it appears

Parsing csv files like this is painful.

Use awk :

awk -F';' '# CSV delimiter set to ;
    $4{ # This block will be executed if room value is not null
        n_persons[$4] += 1
        if (n_persons[$4] > max){
            max = n_persons[$4] #current max of employees per room
            room_max = $4 #current room that has most employees
        }
    }
    END{#this block is executed after reading the file
        print room_max
    }
' <file>

If the column separator appears unescaped in a text field, this will cause the line to have an extra column. Typically the problem will appear when the CSV file is not using double quotes to

If you insist on using Bash, it has associative arrays. I am not entirely sure how you would sort those in plain bash and using only bash sounds a bit complicated for this. Perhaps awk would work better?

In Bash instead of the original approach I would do this with a piping of:

  1. cut (to select the column, as you've already done)
  2. sort (to sort the values so they can be processed with uniq)
  3. uniq -c (to count the number of occurances of a single column value)
  4. sort -nr (to sort by the number of occurances, descending order -- greatest first)
  5. head (to get only the most frequent occurance)

Something along the lines of (untested):

cut -d \; -f 4 input.csv \
  | sort \
  | uniq -c \
  | sort -nr \
  | head -1

If you need to filter out some lines, I would add grep -v after cut. No need to use conditionals, while loops, read builtin. The \ at the end of line tells bash that this "line" continues on the the next line.

The above is the what first occured to me. It can surely be optimized but then again, perhaps you should look for other programming languages or paradigms if this needs to be executed often and as fast as possible.

I need to find the 10 most frequent words in a .csv file. The file is structured so that each line contains comma-separated words. If the same word is repeated more than once in the same line, it should be counted as one. So, in the example below: green,blue,blue,yellow,red,yellow red,blue,green,green,green,brown

My input file is a CSV of varying size, up to around 10GB. The file has several fields, however I'm only interested in the third column, a date-time field. The date-times are in UTC timezone, and they are not ordered. Example of values in column: 2017-08-03T10:22:31.000Z 2017-08-03T10:22:32.000Z 2017-08-03T10:22:37.000Z 2017-08-03T10:22:40.000Z

CSV Kit is the best utility that I’ve found for working with CSV files. It’s a free set of tools for dealing with CSV files on Linux. Some of the cool things it can do are: CSV clean will validate and clean the file of common syntax errors. It isn’t magic, but can definitely help. CSV grep is incredibly useful.

Hi, I'm using regular AutoCAD 2016. I have multiple .csv files that were from a handheld GPS that I need to get into a drawing. I've tried converting them into a .scr file but when I try to import that file into AutoCAD the points don't show up.

Comments
  • Can you include sample input with expected output.
  • Thank you for the input it works like a charm !
  • Hi, thank you very much for the input and for linking awk !
  • thank you for the great input on how works cut, sort ... it's really helpful !