How to transfer (sum up) the counts from a set of ranges to ranges that are englobing those ranges?

sumproduct
if cell equals one of multiple values
sumproduct with conditions
excel if one cell equals another then return another cell
excel formula if cell contains text then return value in another cell
difference between sum and sumproduct in excel
excel if cell equals a or b
sumproduct for dummies

I am working with sequencing data, but I think the problem applies to different range-value datatypes. I want to combine several experiments of read counts(values) from a set DNA regions that have a start and end position (ranges), into added up counts for other set of DNA regions, which generally englobe many of the primary regions. Like in the following example:

Giving the following table A with ranges and counts:

feature start end count1 count2 count3
gene1   1     10  100    30     22
gene2   15    40  20     10     6
gene3   50    70  40     11     7
gene4   100   150 23     15     9

and the following table B (with new ranges):

feature  start  end
range1   1      45
range2   55     160

I would like to get the following count table with the new ranges:

feature  start  end  count1  count2  count3
range1   1      45   120     40      28
range2   55     160  63      26      16

Just to simplify, if there is at least some overlap (at least a fraction a feature in table A is contained in feature in table B), it should be added up. Any idea of a tool available doing that or a script in perl, python or R? I am counting the sequencing reads with bedtools multicov, but as far as I searched there is no other functionality doing what I want. Any idea?

Thanks.

We can do this by:

  1. Creating an artificial key column
  2. Perform an outer join (mxn)
  3. Filter on the start OR end value being between our ranges
  4. pandas.DataFrame.groupby on feature and sum the count columns
  5. Finally concat the output to df2, to get desired output
df1['key'] = 'A'
df2['key'] = 'A'

df3 = pd.merge(df1,df2, on='key', how='outer')

df4 = df3[(df3.start_x.between(df3.start_y, df3.end_y)) | (df3.end_x.between(df3.start_y, df3.end_y))]

df5 = df4.groupby('feature_y').agg({'count1':'sum',
                                    'count2':'sum',
                                    'count3':'sum'}).reset_index()

df_final = pd.concat([df2.drop(['key'], axis=1), df5.drop(['feature_y'], axis=1)], axis=1)

output

print(df_final)
  feature  start  end  count1  count2  count3
0  range1      1   45     120      40      28
1  range2     55  160      63      26      16

7. Iteration — How to Think Like a Computer Scientist: Learning with , Repeated execution of a set of statements is called iteration. Let us write a function now to sum up all the elements in a list of numbers. Do this Notice the slightly tricky call to the range function — we had to add one onto n, The variable count is initialized to 0 and then incremented each time the loop body is executed. To expose ranges values on the worksheet where they can be easily changes, you can join references to logical operators with concatenation like this: = COUNTIFS(ages,">=" $A1, ages,"<=" & B1) This formula counts greater than or equal to (>=) the value in A1 and less than or equal to (>=)the value in B1. With a Pivot Table

You can use apply() and pd.concat() with a custom function where a corresponds to your first dataframe and b corresponds to your second dataframe:

def find_englobed(x):

    englobed = a[(a['start'].between(x['start'], x['end'])) | (a['end'].between(x['start'], x['end']))]

    return englobed[['count1','count2','count3']].sum()

pd.concat([b, b.apply(find_englobed, axis=1)], axis=1)

Yields:

  feature  start  end  count1  count2  count3
0  range1      1   45     120      40      28
1  range2     55  160      63      26      16

Master Excel's SUMPRODUCT Formula., Are you currently using the SUMPRODUCT formula in your Excel files? It can be used like SUMIFS, it can be used to COUNT and it can also be used like INDEX and MATCH. to multiply Price by Quantity and then add up the values in the Revenue column. This is why it's important that your ranges have the same size. 1. Click Kutools > Super LOOKUP > LOOKUP and Sum. See screeshot: 2. Then a LOOKUP and Sum dialog box pops up, you need to finish the below settings: 2.1) In the Lookup and Sum Type section, select Lookup and sum matched values(s) in row(s) option; 2.2) In the Select Range section, specify the Lookup Value, Output Range as well as the Data table

If it can help somebody, based on @rahlf23 answer, I modified it to make it more general, considering that on one side, the counting columns can be more, and that besides the range, it is also important to be on the right chromosome.

So if table "a" is:

feature Chromosome  start   end count1  count2  count3
gene1   Chr1        1       10  100     30      22
gene2   Chr1        15      40  20      10      6
gene3   Chr1        50      70  40      11      7
gene4   Chr1        100     150 23      15      9
gene5   Chr2        5       30  24      17      2
gene5   Chr2        40      80  4       28     16

and table "b" is:

feature Chromosome  start   end
range1  Chr1        1       45
range2  Chr1        55      160
range3  Chr2        10      90
range4  Chr2        100     200

with the following python script:

import pandas as pd

def find_englobed(x):
    englobed = a[(a['Chromosome'] == x['Chromosome']) & (a['start'].between(x['start'], x['end']) | (a['end'].between(x['start'], x['end'])))]
    return englobed[list(a.columns[4:])].sum()

pd.concat([b, b.apply(find_englobed, axis=1)], axis=1)

Now with a['Chromosome'] == x['Chromosome'] & I ask for them to be in the same Chromosome, and with list(a.columns[4:]) I get all the columns from the 5th until the end, being independent on the number of count columns.

I obtain the following result:

feature Chromosome  start   end count1  count2  count3
range1  Chr1        1       45  120.0   40.0    28.0
range2  Chr1        55      160 63.0    26.0    16.0
range3  Chr2        10      90  28.0    45.0    18.0
range4  Chr2        100     200 0.0     0.0     0.0

I am not sure why the obtained counts are with floating points.. any comment?

Built-in Functions — Python 3.8.5 documentation, However, sys.breakpointhook() can be set to some other function and The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. by the flags argument are used in addition to those that would be used anyway. by enumerate() returns a tuple containing a count (from start which defaults to 0) and � To use a formula to sum values in Column B based on Column A, you can create a formula based on the SUMIF function.Just do the following steps: #1 select the text values in Column A (A1:A6), press Ctrl +C to copy these values, and paste into another blank column (Column D).

If you are doing genomics in pandas you might want to look into pyranges:

import pyranges as pr

c = """feature Chromosome  Start   End count1  count2  count3
gene1   Chr1        1       10  100     30      22
gene2   Chr1        15      40  20      10      6
gene3   Chr1        50      70  40      11      7
gene4   Chr1        100     150 23      15      9
gene5   Chr2        5       30  24      17      2
gene5   Chr2        40      80  4       28     16
"""

c2 = """feature Chromosome  Start   End
range1  Chr1        1       45
range2  Chr1        55      160
range3  Chr2        10      90
range4  Chr2        100     200 """

gr, gr2 = pr.from_string(c), pr.from_string(c2)

j = gr2.join(gr).drop(like="_b")
# +------------+--------------+-----------+-----------+-----------+-----------+-----------+
# | feature    | Chromosome   |     Start |       End |    count1 |    count2 |    count3 |
# | (object)   | (category)   |   (int32) |   (int32) |   (int64) |   (int64) |   (int64) |
# |------------+--------------+-----------+-----------+-----------+-----------+-----------|
# | range1     | Chr1         |         1 |        45 |       100 |        30 |        22 |
# | range1     | Chr1         |         1 |        45 |        20 |        10 |         6 |
# | range2     | Chr1         |        55 |       160 |        40 |        11 |         7 |
# | range2     | Chr1         |        55 |       160 |        23 |        15 |         9 |
# | range3     | Chr2         |        10 |        90 |        24 |        17 |         2 |
# | range3     | Chr2         |        10 |        90 |         4 |        28 |        16 |
# +------------+--------------+-----------+-----------+-----------+-----------+-----------+
# Unstranded PyRanges object has 6 rows and 7 columns from 2 chromosomes.
# For printing, the PyRanges was sorted on Chromosome.

df = j.df

fs = {"Chromosome": "first", "Start":
      "first", "End": "first", "count1": "sum", "count2": "sum", "count3": "sum"}
result = df.groupby("feature".split()).agg(fs)
#         Chromosome  Start  End  count1  count2  count3
# feature
# range1        Chr1      1   45     120      40      28
# range2        Chr1     55  160      63      26      16
# range3        Chr2     10   90      28      45      18

4. More Control Flow Tools — Python 3.3.7 documentation, To iterate over the indices of a sequence, you can combine range() The break statement, like in C, breaks out of the smallest enclosing for or while loop. These arguments will be wrapped up in a tuple (see Tuples and Sequences). This function returns the sum of its two arguments: lambda a, b: a+b. Set Range in Excel VBA. Set range in vba means we specify a given range to the code or the procedure to execute, if we do not provide a specific range to a code it will automatically assume the range from the worksheet which has the active cell so it is very important in the code to have range variable set.

Excel IF statement with multiple AND/OR conditions, nested IF , These multiple IF functions are called nested IF functions and they demonstrates how you can sum cells in the specified range based on a And then, the SUM function adds up the resulting 1's and 2's, as shown in the screenshot below. For example, to count the occurrences of a text or numeric value� Use the Range and Union methods to refer to any group of ranges. Use the Areas property to refer to the group of ranges selected on a worksheet. Using the Range Property. You can refer to multiple ranges with the Range property by inserting commas between two or more references. The following example clears the contents of three ranges on Sheet1.

How to AutoSum in Excel, Did you know that Excel SUM is the function that people read about most? Formulas tab > Function Library group > AutoSum: A Sum formula appears in the selected cell, and a range of cells COUNT - to count cells with numbers. AutoSum will treat the outputs as numbers and they will be added up� Suppose you have a product list like in the example below, and you want to get a count of items that are in stock (value in column B is greater than 0) but have not been sold yet (value in column C is equal to 0). The task can be accomplished by using this formula: =COUNTIFS(B2:B7,">0", C2:C7,"=0") And the count is 2 (“Cherries” and

Count number of distinct sum subsets within given range , Given a set S of N numbers and a range specified by two numbers L try your approach on {IDE} first, before moving on to the solution. subsets of given set, calculate their sum subset wise and push them into a hashmap. Here I have entered the COUNT function into D2 and the COUNTA function into E2, using A2:A12 range as the data set for both functions. I also changed the value in A9 to the text string hello to show the difference. COUNT only counts the cells that have numbers whereas COUNTA counts cells that contain text and numbers. Both functions do not

Comments