Iterate over each row and increment the count of the terms associated with the specific researcher in a csv file

python loop through excel sheets
python count occurrences of all words in file
python program to count the occurrences of a word in a text file
python count occurrences of word in file
count the number of times a word appears in a text file python
how to read particular column in excel using python
python loop through excel files
count the number of occurrences of each word in python

I am reading a csv file in python that has many disease names in one column and the associated researchers in the other. The file looks something like this -

[Table 1]
Terms                    Researcher
1.Asthma                 Dr. Roberts
2.Brochial cancer        Dr. Lee
3.HIV                    Dr.Roberts
4.HIV                    Dr. Lee
5.Influenzae             Dr. Wang
6.Bronchial Cancer       Dr. Wang
7.Influenzae             Dr. Roberts
8.dengue                 prof. christopher
9.Arthritis              prof. swaminathan
10.Arthritis             prof. christopher
11.Asthma                Dr. Roberts
12.HIV                   Dr. Lee
13.Bronchial Cancer      Dr. Wang
14.dengue                prof. christopher
15.HIV                   prof. christopher
16.HIV                   Dr. Lee

I want my code to iterate through each row and increment the count of the frequency of a term associated with each researcher so that when the user inputs which term he/she is looking for they should get an output table like this -

Term you are looking for : HIV
Names of the researchers                Frequency
Dr. Roberts                             1
Dr. Lee                                 3
prof. christopher                       1

Now let's look at what I am doing -

In[1]:
import pandas as pd
import numpy as np
data = pd.read_csv("Researchers Title Terms.csv")
data.head()

which is giving me [Table 1] and then I am doing this -

In[2]:
term = input("Enter the term you are looking for:")
term = term.lower()
list_of_terms = []
for row in data: 
    if row[data.Terms] == term
        researcher1 += 1

    elif data.Terms == term
        researcher2 += 1

    elif data.Terms == term
        researcher3 += 1

    else
        print("Sorry!", term, "not found in the database!")
print("Term you are looking for : ", term)
print("Dr. Roberts:", researcher1)
print("Dr. Lee:", researcher2)
print("prof. christopher:", researcher3)

All I am getting here is -

File "<ipython-input-9-b85d0d187059>", line 5
if row[data.Terms] == term
                          ^
SyntaxError: invalid syntax

I am a beginner in python programming so not quite sure if my logic is entirely wrong or there is really some syntactical error here. Any help will be greatly appreciated. After trying a few things and getting no output I am putting this on the community. Thanks in advance!

groupby and value_counts

Simple and intuitive

df.Terms = df.Terms.str.replace('\d+\.\s*', '').str.upper()
df.Researcher = df.Researcher.str.title()
s = df.groupby('Terms').Researcher.value_counts()

s

Terms             Researcher       
ARTHRITIS         Prof. Christopher    1
                  Prof. Swaminathan    1
ASTHMA            Dr. Roberts          2
BROCHIAL CANCER   Dr. Lee              1
BRONCHIAL CANCER  Dr. Wang             2
DENGUE            Prof. Christopher    2
HIV               Dr. Lee              3
                  Dr.Roberts           1
                  Prof. Christopher    1
INFLUENZAE        Dr. Roberts          1
                  Dr. Wang             1
Name: Researcher, dtype: int64

You can access the varying terms with loc or xs

s.loc['HIV']

Researcher
Dr. Lee              3
Dr.Roberts           1
Prof. Christopher    1
Name: Researcher, dtype: int64

Or

s.xs('HIV')

Researcher
Dr. Lee              3
Dr.Roberts           1
Prof. Christopher    1
Name: Researcher, dtype: int64

pd.factorize and np.bincount
import re

pat = re.compile('\d+\.\s*')
f, u = pd.factorize(list(zip(
    (re.sub(pat, '', x).upper() for x in df.Terms),
    df.Researcher.str.title()
)))

s = pd.Series(dict(zip(u, np.bincount(f))))

And you can access the same way as above.

Iterate over Worksheets, Rows, Columns, Read the OpenPyXL Documentation. Iteration over all worksheets in a workbook , for instance: for n, sheet in enumerate(wb.worksheets): print('Sheet Index:[{}],  1 Iterate over each row and increment the count of the terms associated with the specific researcher in a csv file May 15 '18 1 urllib2 read response more than once May 14 '18 1 Python concurrent.futures performance difference with little change May 14 '18

In Python, when creating an if, elif, for loop, etc. The correct syntax is to have a colon at the end of the initialization line. So in your code you would need to update it to the following:

    for row in data: 
        if row[data.Terms] == term:
            researcher1 += 1

        elif data.Terms == term:
            researcher2 += 1

        elif data.Terms == term:
            researcher3 += 1

        else:
            print("Sorry!", term, "not found in the database!")

Also, once you correct this, based on your code it looks like you will have a bug as well. You are setting the user input to lowercase but you are not doing the same to the data read from the CSV file. So none of the terms will equal the user input.

Python, We iterate through each word in the file and add it to the dictionary with count as 1. If the word is already present in the dictionary we increment its count by 1. Iterate over each word in line and CSV to a File in Python · Python | Finding 'n' Character Words in a Text File · Python program to extract Email-id from URL text​  Iterate over specific range of rows with python csv.reader() - iterate_csv.py

You could iterate through your dataframe in a similar way as what you are doing, but since you are using pandas, it might be worth leveraging pandas functions. They are typically much faster than iteration, and the code ends up looking cleaner.

term_of_interest = 'HIV'

(df.groupby('Researcher')
 .apply(lambda x: x.Terms.str.contains(term_of_interest)
        .sum())
 .rename('Frequency').to_frame())

                   Frequency
Researcher                  
Dr. Lee                    3
Dr. Roberts                0
Dr. Wang                   0
Dr.Roberts                 1
prof. christopher          1
prof. swaminathan          0

How to count the number of lines in a CSV file in Python, Use a for-loop to iterate through a CSV file and increment a counter, num_rows , during each iteration. sample.csv. 1,2,3 4,5,6 7,8,9. num_rows = 0. for row in  dt: is populated from a CSV file with over 1.7 million rows; dataStructure.Tables["AccountData"]: is populated from a database query also roughly a million rows; I use the following code to iterate through and compare the data from each set of rows. The code takes over 48 hours to complete.

read the data into pandas. accept input and then filter, groupby & size gives the desired result

term = input("Enter the term you are looking for:")

data[data.Term.str.lower() == term.lower()].groupby('Researcher').size()
# Output with term = 'HIV'
Dr. Lee              3
Dr.Roberts           1
prof. christopher    1
dtype: int64

In this method, researchers not associated with a term (i.e. have size == 0) are not shown.

To show researchers with no terms with a count of zero, first set up a dataframe of researchers and outer join the result dataframe with it.

researchers = pd.DataFrame({'Researcher': data.Researcher.unique()})
out = data[data.Term.str.lower() == term.lower()].groupby('Researcher').agg({'Terms': 'size'})
pd.merge(reserachers, out, how='outer').fillna(0).sort_values('Terms', ascending=False)
# outputs:
          Researcher  Terms
1            Dr. Lee    3.0
2         Dr.Roberts    1.0
4  prof. christopher    1.0
0        Dr. Roberts    0.0
3           Dr. Wang    0.0
5  prof. swaminathan    0.0

4. Working with Excel Files, Working with Excel Files Unlike the previous chapter's data, not all the data in this Do you remember importing the csv and json packages in Chapter 3? We need to iterate over each row, which means we need a for loop. Loops over the index i in range(303) , which will be a list of 303 integers incrementing by one. 2. I'm working on a project where I have a CSV file containing about 25,000 rows of a unique URL in the first (and only) column throughout the CSV file. I'm iterating through each row, getting the unique URL, doing some processing on some data contained behind the URL once I open each unique URL, and writing some extended data to a 2nd CSV file.

from collections import Counter
from pprint import pprint

if __name__ == '__main__':
    docs = ["Dr.Roberts",
            "Dr.Lee",
            "Dr.Roberts",
            "Dr.Lee",
            "Dr.Wang",
            "Dr.Wang",
            "Dr.Roberts",
            "prof.christopher",
            "prof.swaminathan",
            "prof.christopher",
            "Dr.Roberts",
            "Dr.Lee",
            "Dr.Wang",
            "prof.christopher",
            "prof.christopher",
            "Dr.Lee"]
    pprint(Counter(docs).most_common(5))

12 Useful Pandas Techniques in Python for Data Manipulation, #12 – Iterating over rows of a Pandas Dataframe After loading this file, we can iterate through each row and assign the datatype using  How to read Csv each row and each column data using C#.net ? Jun 27, 2014 04:49 AM | mdr.devender | LINK I have a CSV file in that i have data,but here some columns and rows are empty in between where data is exists.I should not get empty columns and rows

How to Use Generators and yield in Python – Real Python, Introduced with PEP 255, generator functions are a special kind of function that return a Now, what if you want to count the number of rows in a CSV file? Then, the program iterates over the list and increments row_count for each row. In this version, you open the file, iterate through it, and yield a row. 13.1. csv — CSV File Reading and Writing¶. The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard”, so the format is operationally defined by the many applications which read and write

How to create a for loop in R for a csv file?, I have about 24 files I'd like to use for this loop. Each file is set up the exact same way, but has different count data. I'm analyzing temporal dynamics of beetle  I have a file containing a long comma-delimited list of numbers, like this: 2,8,42,75,101 What's the simplest command (from a Unix shell) to get the count of numbers in this file? In the example

Pandas iterate over rows and columns, In this lesson, we'll loop over all of our gropings to extract selected rows from each inner DataFrame. head(5) I am a data scientist with a decade of Create pickle file import pandas as pd import numpy as np file_name="data/test. Pandas : Loop or Iterate over all or certain columns of a dataframe; Pandas : count rows in a  Finding the average of each row in a CSV file. Ask Question You can use the fileobject directly to iterate over the lines, You do not need to read the entire file

Comments
  • if row[data.Terms] == term: You are missing colons for elif as well.
  • Works like a charm! Thank you! The only thing is i want to take the input from the user and list only that term that the user wants to look for. The word that user inputs is saved in my variable 'term'. So I am trying -- 's.loc(term)' instead of only 's' but getting this as output -- <pandas.core.indexing._LocIndexer at 0xa0ca320> Can you help? The entire code -
  • term = input("Enter the term you are looking for:") term = term.lower() data.Terms = data.Terms.str.replace('\d+\.\s*', '').str.upper() data.Author = data.Author.str.title() s = data.groupby('Terms').Author.value_counts() /n s.loc(term)