Retrieving total number of words with 2 or more letters in a document using python

write a python program to count words, characters and spaces from text file
python count word frequency dictionary
how to count how many times a word appears in a string in python
average number of words per sentence python
python count total characters in string
count number of words in list python
how to count the number of letters in a sentence in python
python count letters in word

I have a small Python script that calculates the top 10 most frequent words, 10 most infrequent words and the total number of words in a .txt document. According to the assignment, a word is defined as 2 letters or more. I have the 10 most frequent and the 10 most infrequent words printing fine, however when I attempt to print the total number of words in the document it prints the total number of all the words, including the single letter words (such as "a"). How can I get the total number of words to calculate ONLY the words that have 2 letters or more?

Here is my script:

from string import *
from collections import defaultdict
from operator import itemgetter
import re

number = 10
words = {}
total_words = 0
words_only = re.compile(r'^[a-z]{2,}$')
counter = defaultdict(int)

"""Define function to count the total number of words"""
def count_words(s):
    unique_words = split(s)
    return len(unique_words)

"""Define words as 2 letters or more -- no single letter words such as "a" """
for word in words:
    if len(word) >= 2:
        counter[word] += 1


"""Open text document, strip it, then filter it"""
txt_file = open('charactermask.txt', 'r')

for line in txt_file:
    total_words = total_words + count_words(line)
    for word in line.strip().split():
        word = word.strip(punctuation).lower()
        if words_only.match(word):
            counter[word] += 1


# Most Frequent Words
top_words = sorted(counter.iteritems(),
                    key=lambda(word, count): (-count, word))[:number] 

print "Most Frequent Words: "

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)


# Least Frequent Words:
least_words = sorted(counter.iteritems(),
                    key=lambda (word, count): (count, word))[:number]

print " "
print "Least Frequent Words: "

for word, frequency in least_words:
    print "%s: %d" % (word, frequency)


# Total Unique Words:
print " "
print "Total Number of Words: %s" % total_words

I am not an expert with Python, this is for a Python class I am currently taking. The neatness of my code and proper formatting count against me in this assignment, if possible can someone also tell me if the format of this code is considered "good practice"?


The list comprehension method:

def countWords(s):
    words = s.split()
    return len([word for word in words if len(word)>=2])

The verbose method:

def countWords(s):
    words = s.split()
    count = 0
    for word in words:
        if len(word) >= 2:
            count += 1
    return count

As an aside, kudos on using defaultdict, but I would go with collections.Counter:

words = collections.Counter([word for line in open(filepath) for word in line.strip()])
words = dict((k,v) for k,v in words.iteritems if len(k)>=2)
mostFrequent = [w[0] for w in words.most_common(10)]
leastFrequent = [w[0] for w in words.most_common()[-10:]]

Hope this helps

How to Count Characters in MS Word, How do I count the number of words in a python file? Now, in order to make applying our regular expression easier, let's turn all the letters in our document into lower case letters, using the lower() function, as follows: text_string = document_text.read().lower() Let's write our regular expression that would return all the words with the number of characters in the range [3-15].


Count words simply uses split()

You should use the match_words regular expression here too

def count_words(s):
    unique_words = split(s)
    return len(filter(lambda x: words_only.match(x):, unique_words))

Your style looks great :)

Python String count(), How do you count the number of times a word appears in a string in python? I am trying to find a way to count two word sequence in a list of words using python. I converted the one word list into a list of two words.I then want to try to count the frequency of all the similar two word lists. I tried the counter function but it gives me an unhashable type: 'list'. An example of my code can be seen below:


PYTHON Count lines, words and characters in text file, This is an awesome Python exercise on counting. It involves working with text files, counting Duration: 2:36 Posted: Nov 21, 2015 To count total number of word present in the sentence in python, you have to ask from user to enter a sentence or string to count the total number of words as shown in the program given here. Simple Python script without the use of heavy text processing libraries to extract most common words from a corpus.


Personally, I think your code looks fine. I don't know if its "standard" python style, but it is easy to read. I'm pretty new to Python as well but here is my answer.

I'm assuming that your count_words(s) function is what calculates the total number of words. The problem you are having is that by just calling split; you are just separating the words by a space.

You only need to count the 2+ character of words, so in that function write a loop that counts only the number of words with 2+ characters in the unique_words list.

Python Count Occurrences of Letters, Words and Numbers in Strings , In this Python tutorial we will go over how to count occurrences of specific letters, words and Duration: 5:21 Posted: Apr 5, 2017 Sheets in Excel consist of columns (with letters starting from A, B, C, etc.) and rows (starting from 1, 2, 3, etc.). In order to check what sheets we have in our Excel document, we use the get_sheet_names() method as follows: excel_document.get_sheet_names() If we print the above command, we get the following: [u'Sheet1']


Python, Extracting email addresses using regular expressions in Python · Python Tutorial · Django Python | Count the Number of matching characters in a pair of string str2 = 'bb22ll@55k' Output : 5 (i.e. matching characters :- b, 1, 2, @, k) 3. If the character extracted from first string is found in the second string and also first  Some initial data exploration reveals that our training set contains 42,000 samples in total and 784 features. Each sample in the dataset represent an image that is 28 pixels in height and 28


Python: Count the number of each character of a given text of a text , Python Exercises, Practice and Solution: Write a Python program to count the number of each character of a given text of a text file. is the national day of Germany, celebrated on 3 October as a public holiday. the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from  I'm learning programming with Python. I’ve written the code below for finding the most common words in a text file that has about 1.1 million words. It is working fine, but I believe there is always


Counting Word Frequencies with Python, Counting the frequency of specific words in a list can provide illustrative data. Python has an easy way to count frequencies, but it requires the use of a new go through each word in the wordlist, and count the number of times that can be used to do the same thing as the for loop more economically. The algorithm to print the pattern using for loop in Python: We need to use two for loops to print patterns, i.e. nested loops. There is a typical structure to print any pattern, i.e. the number of rows and columns in the pattern. Outer loop tells us the number of rows used and the inner loop tells us the column used to print pattern.