Find the nth most common word and count in python

python most common element in list
python program to find most common words
find most common words in list python
python find most frequent string in list
common number in list python
print which letter is most frequent in python
python sort dictionary by value count
k most frequent elements python

I am a undergraduate student who is new here and loves programming. I meet a problem in practice and I want to ask for help here.

Given a string an integer n, return the nth most common word and it's count, ignore capitalization.

For the word, make sure all the letters are lowercase when you return it!

Hint: The split() function and dictionaries may be useful.

Example:

Input: "apple apple apple blue BlUe call", 2

Output: The list ["blue", 2]

My code is in the following:

from collections import Counter
def nth_most(str_in, n):
    split_it = str_in.split(" ")
    array = []
    for word, count in Counter(split_it).most_common(n):
        list = [word, count]
        array.append(count)
        array.sort()
        if len(array) - n <= len(array) - 1:
            c = array[len(array) - n]
            return [word, c]

The test result is like in the following:

Traceback (most recent call last):
  File "/grade/run/test.py", line 10, in test_one
    self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
  File "/grade/run/bin/nth_most.py", line 10, in nth_most
    c = array[len(array) - n]
IndexError: list index out of range

As well as

Traceback (most recent call last):
  File "/grade/run/test.py", line 20, in test_negative
    self.assertEqual(nth_most('awe Awe AWE BLUE BLUE call', 1), ['awe', 3])
AssertionError: Lists differ: ['BLUE', 2] != ['awe', 3]

First differing element 0:
'BLUE'
'awe'

I don't know what's wrong with my code.

Thank you very much for your help!

Since you're using Counter, just use it wisely:

import collections

def nth_most(str_in, n):
    c = sorted(collections.Counter(w.lower() for w in str_in.split()).items(),key = lambda x:x[1])
    return(list(c[-n])) # convert to list as it seems to be the expected output

print(nth_most("apple apple apple blue BlUe call",2)) 

Build the word frequency dictionary, sort items according to values (2nd element of the tuple) and pick the nth last element.

This prints ['blue', 2].

What if there are 2 words with same frequency (tie) in first or second position ? This solution doesn't work. Instead, sort the number of occurrences, extract the nth most common occurrence, and run through the counter dict again to extract matches.

def nth_most(str_in, n):
    c = collections.Counter(w.lower() for w in str_in.split())
    nth_occs = sorted(c.values())[-n]
    return [[k,v] for k,v in c.items() if v==nth_occs]

print(nth_most("apple apple apple call blue BlUe call woot",2))

this time it prints:

[['call', 2], ['blue', 2]]

Find the k most frequent words from data set in Python , The function 'most-common()' inside Counter will return the list of most frequent words from list and its count. Below is Python implementation of above approach :. These words are usually the most common in any English language text, so they don’t tell us much that is distinctive about Bowsey’s trial. In general, we are more interested in finding the words that will help us differentiate this text from texts that are about different subjects. So we’re going to filter out the common function words.

Counter return most commune elements in order so you can do like:

list(Counter(str_in.lower().split()).most_common(n)[-1]) # n is nth most common word

Python, Make use of Python Counter which returns count of each element in the list. Thus, we simply find the most common element by using most_common() method. Import Counter class from collections module. Split the string into list using split (), it will return the lists of words. Now pass the list to the instance of Counter class. The function 'most-common ()' inside Counter will return the list of most frequent words from list and its count.

def nth_common(lowered_words, check):
    m = []
    for i in lowered_words:
        m.append((i, lowered_words.count(i)))
    for i in set(m):
        # print(i)
        if i[1] == check: # check if the first index value (occurrance) of tuple == check
            print(i, "found")
    del m[:] # deleting list for using it again


words = ['apple', 'apple', 'apple', 'blue', 'BLue', 'call', 'cAlL']
lowered_words = [x.lower() for x in words]   # ignoring the uppercase
check = 2   # the check

nth_common(lowered_words, check)

OUTPUT:

('blue', 2) found
('call', 2) found

Hands-on Python: Finding Nth most repeated words using sorted , Returning top Nth words of above dictionary sorted by highest count. We should get to a solution with just a few lines at the end but the key� Suppose we want to know what is the most common last name in our class, school or company. In this case, we don’t need the complete name (First Name + Last Name) of a person, we only need his last name to find out the most common name. So, let’s see how can we extract it from a given string. Using loops: Print Nth word in a given string

Traceback (most recent call last):
  File "/grade/run/test.py", line 10, in test_one
    self.assertEqual(nth_most('apple apple apple blue blue call', 3), ['call', 1])
  File "/grade/run/bin/nth_most.py", line 10, in nth_most
    c = array[len(array) - n]
IndexError: list index out of range

to solve this list out of index error, just put

maxN = 1000 #change according to your max length
array = [ 0 for _ in range( maxN ) ]

histogram.py, __name__ # Python 3 print("-- %s took %0.3f us" % (name, (end - start) * 1e6)) hist) print('Number of different words:', count) # # Find most commonly used words I'm going to use a linear search because it's simpler. nth = random. randint(0,� Then the large program will be in just between 3 to 4 lines to find the most frequent word. Program: from collections import Counter given_string = “Hi, friends this program is found in codespeedy. This program works perfectly” words = given_string.split(” “) words_count = Counter(words).most_common()

Even you can get without Collection module: paragraph='Nory was a Catholic because her mother was a Catholic, and Nory’s mother was a Catholic because her father was a Catholic, and her father was a Catholic because his mother was a Catholic, or had been'

def nth_common(n,p):
    words=re.split('\W+',p.lower())
    word_count={}
    counter=0
    for i in words:
        if i in word_count:
            word_count[i]+=1
        else:
            word_count[i]=1

    sorted_count = sorted(word_count.items(), key=lambda x: x[1],reverse=True)         

    return sorted_count[n-1]
nth_common(3,paragraph)

output will be ('catholic', 6)

sorted(based on count) word count output: [('was', 6), ('a', 6), ('catholic', 6), ('because', 3), ('her', 3), ('mother', 3), ('nory', 2), ('and', 2), ('father', 2), ('s', 1), ('his', 1), ('or', 1), ('had', 1), ('been', 1)]

Top K Frequent Words, This is the best place to expand your knowledge and get prepared for your next Given a non-empty list of words, return the k most frequent elements. and "day" are the four most frequent words, with the number of occurrence being 4, Python class Solution(object): def topKFrequent(self, words, k): count = collections. Python program to count words in a sentence Data preprocessing is an important task in text classification. With emergence of Python in the field of data science, it is essential to have certain shorthands to have upper hand among others.

How to find the most repeated integer in a list in python, Line 8 is finding the largest count - normally calling max on a dictionary will provide the with some smart extra features) to show the three most common occurring integers in the list. In other words the most occurring element in the list. If there is a need to find 10 most frequent words in a data set, python can help us find it using the collections module. The collections module has a counter class which gives the count of the words after we supply a list of words to it. We also use the most_common method to find out the number of such words as needed by the program input.

How to find the second most common/frequent number or text in , And what if finding out the second most common text value from a column? or Remove Part of Texts; Convert Numbers and Currencies to English Words. you to batch count the occurrences of each item which you will find out the second� Make use of Python Counter which returns count of each element in the list. Thus, we simply find the most common element by using most_common () method.

Python Tips & Tricks: Find Most Frequent Element in a , Most common used flex styles*/ /* Basic flexbox reverse styles */ /* Flexbox alignment */ /* Non Duration: 2:27 Posted: Feb 8, 2017 Output : Updated list is: ['geeks', 'for'] Approach #2: Remove from the list itself. Instead of making a new list, delete the matching element from the list itself. Iterate the elements in the list and check if the word to be removed matches the element and the occurrence number, If yes delete that item and retu

Comments
  • why not ['apple',3] ?
  • @Jean-FrançoisFabre, the question is about finding the nth most common word. For the test case mentioned in the question, n=2, and blue occurs twice, hence is the output.
  • @Larry Chen, you may mark an answer that helped you solve your porblem.
  • Thank you very much! I think your idea helps me a lot.
  • is there any performance benefit by using c.most_common() instead of sorted
  • most_common probably does the same thing, but it doesn't output the required answer so needs some post-processing to filter out the elements, specially if there are ties
  • Why not Counter(s.lower().split()).most_common()[n-1] ? Also from collections import Counter.
  • if you use most_common()[n-1] you will use an O(nlogn) algorithm , if you use most_common(k) you will use O(nlogk) algorithm (check this link)
  • as a good practice in python, do not reinvent the wheel, specially when it comes to python standard library
  • This would return ('catholic', 3) which is incorrect since the word catholic came 6 times. The correct output should have been ('mother', 3)
  • split has taken on space and other catholic are having , with them that's why it considered them as separate word, replace p.lower.split() with re.split('\W+',p.lower()) then catholic will have count as 6 since in this example there are three top words which has count 6 to it takes one of them