nltk.trigams has no count attribute

nltk.trigams has no count attribute

nltk collocations
nltk bigrams
nltk count
nltk freqdist
nltk trigrams
nltk corpus
nltk ngrams
nltk stopwords

I'm running Python-3.x on a virtualenv, trying to process text with nltk.

I saw this post What are ngram counts... and the most upvoted answer has a bit of code using the count() method. but when I copy/paste it into mine:

import nltk
from nltk import bigrams
from nltk import trigrams

text="""Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam tempus vitae. Morbi justo mauris,
congue sit amet imperdiet ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam"""

tokens = nltk.word_tokenize(text)
tokens = [token.lower() for token in tokens if len(token) > 1]
bi_tokens = bigrams(tokens)
tri_tokens = trigrams(tokens)

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]

I receive this message:

AttributeError: 'generator' object has no attribute 'count'

I see this other post on a monkeypatch for a count method but feel like that's somehow not related. Any idea what I might be doing wrong?


It's because nltk.ngramsreturns an iterable generator, see https://www.python.org/dev/peps/pep-0255/ and What does the "yield" keyword do in Python?

You should use a collections.Counter:

>>> from nltk import ngrams
>>> from collections import Counter
>>> s = "This is a foo bar sentence".split()
>>> Counter(ngrams(s, 3))
Counter({('This', 'is', 'a'): 1, ('a', 'foo', 'bar'): 1, ('is', 'a', 'foo'): 1, ('foo', 'bar', 'sentence'): 1})

What are ngram counts and how to implement using nltk? I can create unigrams, bigrams, trigrams, etc. out of this text, where I have to define on which "​level" to create these unigrams = {} for token in tokens: if token not in unigrams: unigrams[token] = 1 else: unigrams[token] += 1 nltk.trigams has no count attribute. Recommend:python - nltk.trigams has no count attribute. hat are ngram counts and the most upvoted answer has a bit of code using the count() method. but when I copy/paste it into mine: import nltkfrom nltk import bigramsfrom nltk import trigramstext="""Lorem ipsum dolor sit amet, consecte


You are facing this AttributeError: 'generator' object has no attribute 'count' issue because Generator is destroyed after first use in python.

tri_tokens is Generator. It is used twice in your code.

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]

In above line of code, tri_token used twice. So, when you want to get count of item, your generator is already destroyed, after (sorted(set(tri_tokens)) uses. That' why you get AttributeError issue.

So, Best way is to convert generator to list.

tri_tokens = list(tri_tokens)

Try below code:

import nltk
from nltk import bigrams
from nltk import trigrams

text="""Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam tempus vitae. Morbi justo mauris,
congue sit amet imperdiet ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam"""

tokens = nltk.word_tokenize(text)
tokens = [token.lower() for token in tokens if len(token) > 1]
bi_tokens = bigrams(tokens)
tri_tokens = trigrams(tokens)

tri_tokens = list(tri_tokens)

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]

When window_size > 2, count non-contiguous bigrams, in the style of Church and Hanks's (1990) association ratio. score_ngram Construct a TrigramCollocationFinder for all trigrams in the given sequence. If no protocol is specified, then the default protocol nltk: will be used. property download_dir ¶. If no newline is encountered before size bytes have been read, then the returned value may not be a complete line of text. readlines ( sizehint=None , keepends=True ) [source] ¶ Read this file’s contents, decode them using this reader’s encoding, and return it as a list of unicode lines.


Other answers didn't work for me, I ended up using:

bi_tokens = list(bigrams(tokens))
tri_tokens = list(trigrams(tokens))

after converting to lists it is possible to count()

Therefore it is useful to apply filters, such as ignoring all bigrams which occur less than text = "I do not like green eggs and ham, I do not like them Sam I am! set​(trigram for trigram, score in scored) == set(nltk.trigrams(tokens)) True While frequency counts make marginals readily available for collocation finding, it is  class PlaintextCorpusReader (CorpusReader): """ Reader for corpora that consist of plaintext documents. Paragraphs are assumed to be split using blank lines. Sentences and words can be tokenized using the default tokenizers, or by custom tokenizers specificed as parameters to the constructor.


What is a good Python data structure for storing words and their categories? ski and race, can be used as nouns or verbs with no difference in pronunciation. Note that the items being counted in the frequency distribution are word-tag pairs. process(sentence): for (w1,t1), (w2,t2), (w3,t3) in nltk.trigrams(sentence): [1]  nltk.trigams has no count attribute. Ask Question Asked 4 years, 3 months ago. Active 1 year, 6 months ago. Viewed 463 times 1. I'm running


IOError – If the path specified by this pointer does not contain a is used to encode “frequency distributions”, which count the number of times  If a sequence has no count attribute, its default count is 1. Example: > obicount -a seq.fasta For all sequence records contained in the seq.fasta file,


class nltk.lm.api. Helper method for retrieving counts for a given context. Assumes context Note that this method does not mask its arguments with the OOV label. To see what kind, look at gamma attribute on the class. The name of the item as it will be visible on the ground and in the inventory, how many and tag data / damage value. The item cannot be picked up if it has no Count tag. id, Damage, Count, tag id, Count {Item:{id:"",Damage:#,Count:#,tag:{Item NBT data here}}} Enchantments: Allows the addition of enchantments to items.