How to efficiently calculate prefix sum of frequencies of characters in a string?

Related searches

Say, I have a string

s = 'AAABBBCAB'

How can I efficiently calculate the prefix sum of frequencies of each character in the string, i.e.:

psum = [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, {'A': 4, 'B': 4, 'C': 1}]

You can do it in one line using itertools.accumulate and collections.Counter:

from collections import Counter
from itertools import accumulate

s = 'AAABBBCAB'
psum = list(accumulate(map(Counter, s)))

This gives you a list of Counter objects. Now, to get frequencies for any substring of s in O(1) time, you can simply subtract counters, e.g.:

>>> psum[6] - psum[1]  # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})

Queries for frequencies of characters in substrings, Say, I have a string. Say, I have a string s = 'AAABBBCAB'. How can I efficiently calculate the prefix sum of frequencies of each character in the string, i.e.: Length of string = 7 Count of all possible substrings = (7 * (8 + 1)) / 2 = 28 Since, all the characters of the string are included in sprecialArray[], ratio of count of special characters to the length of substring for every substring will always be 1. Hence, the sum of ratio = Number of substrings * 1 = 28.

this is an option:

from collections import Counter

c = Counter()
s = 'AAABBBCAB'

psum = []
for char in s:
    c.update(char)
    psum.append(dict(c))

# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, 
#  {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
#  {'A': 4, 'B': 4, 'C': 1}]

i use collections.Counter in order to keep a 'running sum' and add (a copy of the result) to the list psum. this way i iterate once only over the string s.

if you prefer to have collections.Counter objects in your result, you could change the last line to

psum.append(c.copy())

in order to get

[Counter({'A': 1}), Counter({'A': 2}), ...
 Counter({'A': 4, 'B': 4, 'C': 1})]

the same result could also be achieved with this (using accumulate was first proposed in Eugene Yarmash's answer; i just avoid map in favour of a generator expression):

from itertools import accumulate
from collections import Counter

s = "AAABBBCAB"
psum = list(accumulate(Counter(char) for char in s))

just for completeness (as there is no 'pure dict' answer here yet). if you do not want to use Counter or defaultdict you could use this as well:

c = {}
s = 'AAABBBCAB'

psum = []
for char in s:
    c[char] = c.get(char, 0) + 1
    psum.append(c.copy())

although defaultdict is usually more performant than dict.get(key, default).

Prefix Sum Array, Find frequency of character c in substring l to r. Efficient Approach:We can pre- compute the count for each character. 0 to size of string. Iterate over the string and genertae frequencies of substrings by using the prefix sum array. If a substring with same frequency of characters is already present in the HashMap . Otherwise, store the frequency of characters of the substring with the current substring in the HashMap , if the frequency of the character X in the substring is 0 .

You actually don't even need a counter for this, just a defaultdict would suffice!

from collections import defaultdict

c = defaultdict(int)
s = 'AAABBBCAB'

psum = []

#iterate through the character
for char in s:
    #Update count for each character
    c[char] +=1
    #Add the updated dictionary to the output list
    psum.append(dict(c))

print(psum)

The output looks like

[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, 
{'A': 3, 'B': 2}, {'A': 3, 'B': 3}, 
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, 
{'A': 4, 'B': 4, 'C': 1}]

CBAL - Editorial - editorial, Efficient approach using Prefix Sum Array : 1 : Run a loop for 'm' times, inputting 'a' and 'b'. 2 : Add 100 at index 'a' and subtract 100 from index 'b+1'. 3 : After completion of 'm' operations, compute the prefix sum array. 4 : Scan the largest element and we're done. Maximize length of the String by concatenating characters from an Array of Strings; Find the amplitude and number of waves for the given array; Find relative rank of each element in array; Find the pair (a, b) with minimum LCM such that their sum is equal to N; Count all possible unique sum of series K, K+1, K+2, K+3, K+4, …, K+N

Simplest would be to use the Counter object from collections.

from collections import Counter

s = 'AAABBBCAB'

[ dict(Counter(s[:i]) for i in range(1,len(s))]

Yields:

[{'A': 1},  {'A': 2},  {'A': 3},  {'A': 3, 'B': 1},  {'A': 3, 'B': 2},
{'A': 3, 'B': 3},  {'A': 3, 'B': 3, 'C': 1},  {'A': 4, 'B': 3, 'C': 1}]

We are given a string S of length N, and we want to determine whether its substring S[L, N], such that Q[x] stores the frequency of element x in the subarray. prefix sums to quickly retrieve the parity of each character in a substring, How do you store/calculate the count array(Z[][] I suppose) in that case? Given a string, the task is to find the frequencies of all the characters in that string and return a dictionary with key as the character and its value as its frequency in the given string. Method #1 : Naive method

In Python 3.8 you can use a list comprehension with an assignment expression (aka "the walrus operator"):

>>> from collections import Counter
>>> s = 'AAABBBCAB'
>>> c = Counter()
>>> [c := c + Counter(x) for x in s]
[Counter({'A': 1}), Counter({'A': 2}), Counter({'A': 3}), Counter({'A': 3, 'B': 1}), Counter({'A': 3, 'B': 2}), Counter({'A': 3, 'B': 3}), Counter({'A': 3, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 4, 'C': 1})]

Say, I have a string. s = 'AAABBBCAB'. How can I efficiently calculate the prefix sum of frequencies of each character in the string, i.e.: psum = [{'A': 1}, {'A': 2}, {'A': � 1 : Run a loop for 'm' times, inputting 'a' and 'b'. 2 : Add 100 at index 'a' and subtract 100 from index 'b+1'. 3 : After completion of 'm' operations, compute the prefix sum array. 4 : Scan the largest element and we're done. What we did was adding 100 at ‘a’ because this will add 100 to all elements while taking prefix sum array.

7 How to efficiently calculate prefix sum of 6 “not in” identity operator not working when checking empty string for certain characters Salary Calculator;

6 How to clean up string to load to an array in 6 How to efficiently calculate prefix sum of frequencies of characters 5 Sum of digits untill reach single

75 How do I reliably split a string in 34 How can I exclude some characters from a 23 How to efficiently calculate prefix sum of frequencies of characters

Comments
  • Finally you want one dict or you want a list of dicts for each char while reading?
  • @Vanjith I want a running counter of character frequencies.
  • We don't even need Counter here, a simple defaultdict will do @hiro-protagonist , check my answer below!
  • what makes you say defaultdict is 'simpler' than Counter? simpler in what way?
  • @DeveshKumarSingh they are both subclasses of dict; the data structure of a counter is not more complicated that the one of a dict. or what am i missing?
  • @DeveshKumarSingh, this considerations are misplaced. I've pointed time performance difference, but the OP should make his(her) own decision.
  • @DeveshKumarSingh: Your answer came later than this one, it is the exact same structure with a slightly different type, it has the same complexity but with a more verbose output. You shouldn't advertise it here.
  • Just to note, Counter is a subclass of dict, so there's little reason to replace the Counter with a plain dict.
  • I agree, but its more in line with what user specified as output. I would keep the Counter objects myself as they have useful functions in addition to being a dict.
  • This is an elegant 1-liner so +1, but is quadratic rather than linear. I suspect that the similar solution by hiro protagonist is more efficient.