## How to transfer (sum up) the counts from a set of ranges to ranges that are englobing those ranges?

if cell equals one of multiple values

sumproduct with conditions

excel if one cell equals another then return another cell

excel formula if cell contains text then return value in another cell

difference between sum and sumproduct in excel

excel if cell equals a or b

sumproduct for dummies

I am working with sequencing data, but I think the problem applies to different range-value datatypes. I want to combine several experiments of read counts(values) from a set DNA regions that have a start and end position (ranges), into added up counts for other set of DNA regions, which generally englobe many of the primary regions. Like in the following example:

Giving the following table A with ranges and counts:

feature start end count1 count2 count3 gene1 1 10 100 30 22 gene2 15 40 20 10 6 gene3 50 70 40 11 7 gene4 100 150 23 15 9

and the following table B (with new ranges):

feature start end range1 1 45 range2 55 160

I would like to get the following count table with the new ranges:

feature start end count1 count2 count3 range1 1 45 120 40 28 range2 55 160 63 26 16

Just to simplify, if there is at least some overlap (at least a fraction a feature in table A is contained in feature in table B), it should be added up. Any idea of a tool available doing that or a script in perl, python or R? I am counting the sequencing reads with bedtools multicov, but as far as I searched there is no other functionality doing what I want. Any idea?

Thanks.

We can do this by:

- Creating an artificial
`key`

column - Perform an
`outer`

join`(mxn)`

- Filter on the
`start`

OR`end`

value being between our`ranges`

`pandas.DataFrame.groupby`

on`feature`

and`sum`

the`count`

columns- Finally
`concat`

the output to`df2`

, to get desired output

df1['key'] = 'A' df2['key'] = 'A' df3 = pd.merge(df1,df2, on='key', how='outer') df4 = df3[(df3.start_x.between(df3.start_y, df3.end_y)) | (df3.end_x.between(df3.start_y, df3.end_y))] df5 = df4.groupby('feature_y').agg({'count1':'sum', 'count2':'sum', 'count3':'sum'}).reset_index() df_final = pd.concat([df2.drop(['key'], axis=1), df5.drop(['feature_y'], axis=1)], axis=1)

**output**

print(df_final) feature start end count1 count2 count3 0 range1 1 45 120 40 28 1 range2 55 160 63 26 16

**7. Iteration — How to Think Like a Computer Scientist: Learning with ,** Repeated execution of a set of statements is called iteration. Let us write a function now to sum up all the elements in a list of numbers. Do this Notice the slightly tricky call to the range function — we had to add one onto n, The variable count is initialized to 0 and then incremented each time the loop body is executed. To expose ranges values on the worksheet where they can be easily changes, you can join references to logical operators with concatenation like this: = COUNTIFS(ages,">=" $A1, ages,"<=" & B1) This formula counts greater than or equal to (>=) the value in A1 and less than or equal to (>=)the value in B1. With a Pivot Table

You can use `apply()`

and `pd.concat()`

with a custom function where `a`

corresponds to your first dataframe and `b`

corresponds to your second dataframe:

def find_englobed(x): englobed = a[(a['start'].between(x['start'], x['end'])) | (a['end'].between(x['start'], x['end']))] return englobed[['count1','count2','count3']].sum() pd.concat([b, b.apply(find_englobed, axis=1)], axis=1)

Yields:

feature start end count1 count2 count3 0 range1 1 45 120 40 28 1 range2 55 160 63 26 16

**Master Excel's SUMPRODUCT Formula.,** Are you currently using the SUMPRODUCT formula in your Excel files? It can be used like SUMIFS, it can be used to COUNT and it can also be used like INDEX and MATCH. to multiply Price by Quantity and then add up the values in the Revenue column. This is why it's important that your ranges have the same size. 1. Click Kutools > Super LOOKUP > LOOKUP and Sum. See screeshot: 2. Then a LOOKUP and Sum dialog box pops up, you need to finish the below settings: 2.1) In the Lookup and Sum Type section, select Lookup and sum matched values(s) in row(s) option; 2.2) In the Select Range section, specify the Lookup Value, Output Range as well as the Data table

If it can help somebody, based on @rahlf23 answer, I modified it to make it more general, considering that on one side, the counting columns can be more, and that besides the range, it is also important to be on the right chromosome.

So if table "a" is:

feature Chromosome start end count1 count2 count3 gene1 Chr1 1 10 100 30 22 gene2 Chr1 15 40 20 10 6 gene3 Chr1 50 70 40 11 7 gene4 Chr1 100 150 23 15 9 gene5 Chr2 5 30 24 17 2 gene5 Chr2 40 80 4 28 16

and table "b" is:

feature Chromosome start end range1 Chr1 1 45 range2 Chr1 55 160 range3 Chr2 10 90 range4 Chr2 100 200

with the following python script:

import pandas as pd def find_englobed(x): englobed = a[(a['Chromosome'] == x['Chromosome']) & (a['start'].between(x['start'], x['end']) | (a['end'].between(x['start'], x['end'])))] return englobed[list(a.columns[4:])].sum() pd.concat([b, b.apply(find_englobed, axis=1)], axis=1)

Now with `a['Chromosome'] == x['Chromosome'] &`

I ask for them to be in the same Chromosome, and with `list(a.columns[4:])`

I get all the columns from the 5th until the end, being independent on the number of count columns.

I obtain the following result:

feature Chromosome start end count1 count2 count3 range1 Chr1 1 45 120.0 40.0 28.0 range2 Chr1 55 160 63.0 26.0 16.0 range3 Chr2 10 90 28.0 45.0 18.0 range4 Chr2 100 200 0.0 0.0 0.0

I am not sure why the obtained counts are with floating points.. any comment?

**Built-in Functions — Python 3.8.5 documentation,** However, sys.breakpointhook() can be set to some other function and The bytearray class is a mutable sequence of integers in the range 0 <= x < 256. by the flags argument are used in addition to those that would be used anyway. by enumerate() returns a tuple containing a count (from start which defaults to 0) and � To use a formula to sum values in Column B based on Column A, you can create a formula based on the SUMIF function.Just do the following steps: #1 select the text values in Column A (A1:A6), press Ctrl +C to copy these values, and paste into another blank column (Column D).

If you are doing genomics in pandas you might want to look into pyranges:

import pyranges as pr c = """feature Chromosome Start End count1 count2 count3 gene1 Chr1 1 10 100 30 22 gene2 Chr1 15 40 20 10 6 gene3 Chr1 50 70 40 11 7 gene4 Chr1 100 150 23 15 9 gene5 Chr2 5 30 24 17 2 gene5 Chr2 40 80 4 28 16 """ c2 = """feature Chromosome Start End range1 Chr1 1 45 range2 Chr1 55 160 range3 Chr2 10 90 range4 Chr2 100 200 """ gr, gr2 = pr.from_string(c), pr.from_string(c2) j = gr2.join(gr).drop(like="_b") # +------------+--------------+-----------+-----------+-----------+-----------+-----------+ # | feature | Chromosome | Start | End | count1 | count2 | count3 | # | (object) | (category) | (int32) | (int32) | (int64) | (int64) | (int64) | # |------------+--------------+-----------+-----------+-----------+-----------+-----------| # | range1 | Chr1 | 1 | 45 | 100 | 30 | 22 | # | range1 | Chr1 | 1 | 45 | 20 | 10 | 6 | # | range2 | Chr1 | 55 | 160 | 40 | 11 | 7 | # | range2 | Chr1 | 55 | 160 | 23 | 15 | 9 | # | range3 | Chr2 | 10 | 90 | 24 | 17 | 2 | # | range3 | Chr2 | 10 | 90 | 4 | 28 | 16 | # +------------+--------------+-----------+-----------+-----------+-----------+-----------+ # Unstranded PyRanges object has 6 rows and 7 columns from 2 chromosomes. # For printing, the PyRanges was sorted on Chromosome. df = j.df fs = {"Chromosome": "first", "Start": "first", "End": "first", "count1": "sum", "count2": "sum", "count3": "sum"} result = df.groupby("feature".split()).agg(fs) # Chromosome Start End count1 count2 count3 # feature # range1 Chr1 1 45 120 40 28 # range2 Chr1 55 160 63 26 16 # range3 Chr2 10 90 28 45 18

**4. More Control Flow Tools — Python 3.3.7 documentation,** To iterate over the indices of a sequence, you can combine range() The break statement, like in C, breaks out of the smallest enclosing for or while loop. These arguments will be wrapped up in a tuple (see Tuples and Sequences). This function returns the sum of its two arguments: lambda a, b: a+b. Set Range in Excel VBA. Set range in vba means we specify a given range to the code or the procedure to execute, if we do not provide a specific range to a code it will automatically assume the range from the worksheet which has the active cell so it is very important in the code to have range variable set.

**Excel IF statement with multiple AND/OR conditions, nested IF ,** These multiple IF functions are called nested IF functions and they demonstrates how you can sum cells in the specified range based on a And then, the SUM function adds up the resulting 1's and 2's, as shown in the screenshot below. For example, to count the occurrences of a text or numeric value� Use the Range and Union methods to refer to any group of ranges. Use the Areas property to refer to the group of ranges selected on a worksheet. Using the Range Property. You can refer to multiple ranges with the Range property by inserting commas between two or more references. The following example clears the contents of three ranges on Sheet1.

**How to AutoSum in Excel,** Did you know that Excel SUM is the function that people read about most? Formulas tab > Function Library group > AutoSum: A Sum formula appears in the selected cell, and a range of cells COUNT - to count cells with numbers. AutoSum will treat the outputs as numbers and they will be added up� Suppose you have a product list like in the example below, and you want to get a count of items that are in stock (value in column B is greater than 0) but have not been sold yet (value in column C is equal to 0). The task can be accomplished by using this formula: =COUNTIFS(B2:B7,">0", C2:C7,"=0") And the count is 2 (“Cherries” and

**Count number of distinct sum subsets within given range ,** Given a set S of N numbers and a range specified by two numbers L try your approach on {IDE} first, before moving on to the solution. subsets of given set, calculate their sum subset wise and push them into a hashmap. Here I have entered the COUNT function into D2 and the COUNTA function into E2, using A2:A12 range as the data set for both functions. I also changed the value in A9 to the text string hello to show the difference. COUNT only counts the cells that have numbers whereas COUNTA counts cells that contain text and numbers. Both functions do not

##### Comments

- pyranges.multioverlap can take a set of ranges to count overlaps in: pyranges.readthedocs.io/en/latest/autoapi/pyranges/multioverlap/…
- Just use
`.astype(int)`

for those columns