## Calculations from segments(looping) of a data-frame

2 data frames. 1 short 1 long. I want to break the long one into chunks, to compare them to the short one, using correlation coefficient.

The splits are fine. However when putting them in calculation, it returns Nan.

import pandas as pd data_a = {'ID': ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"], 'Unit_Weight': [178,153,193,195,214,157,205,212,219,166,217,186,170,207,204]} df_a = pd.DataFrame(data_a) data_b = {'ID': ["b1","b2","b3","b4","b5"], 'Unit_Weight': [128,123,123,125,204]} df_b = pd.DataFrame(data_b) size = 5 # 5 rows in the long data-frame list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)] for each in list_of_df_a: corr_e = each['Unit_Weight'].corr(df_b['Unit_Weight'])

Output:

0.6797202605786716 nan nan

What went wrong, and how can it be corrected? Thank you.

p.s.: these are the results when manually calculated:

0.6797202605786716 -0.5501914564062937 0.2653370297540246 ID Unit_Weight 0 a1 178 1 a2 153 2 a3 193 3 a4 195 4 a5 214 ID Unit_Weight 5 a6 157 6 a7 205 7 a8 212 8 a9 219 9 a10 166 ID Unit_Weight 10 a11 217 11 a12 186 12 a13 170 13 a14 207 14 a15 204

There is necessary same indices in both `Series`

, so use `DataFrame.reset_index`

with `drop=True`

:

for each in list_of_df_a: corr_e = each['Unit_Weight'].reset_index(drop=True).corr(df_b['Unit_Weight']) print (corr_e) 0.6797202605786716 -0.5501914564062937 0.26533702975402457

**Loops and Functions in R,** So far everything we have done, we've done by hand: calculate a single mean, plot a single For example, we can do something to every row of our dataframe. Varun March 10, 2019 Pandas : Loop or Iterate over all or certain columns of a dataframe 2019-03-10T19:11:21+05:30 Pandas, Python No Comment In this article we will different ways to iterate over all or certain columns of a Dataframe.

@jezrael has a very good answer, but another way would be to change:

list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

To:

list_of_df_a = [df_a.loc[i:i+size-1,:].reset_index(drop=True) for i in range(0, len(df_a),size)]

And now your results would be:

0.6797202605786716 -0.5501914564062937 0.26533702975402457

**Repeating things,** In R there is a whole family of looping functions, each with their own strengths. Sometimes the combine phase means making a new data frame, other " continent", get.n.countries); or as a formula: daply(data, ~continent,� In each round through the loop, add the outcome of switch() at the end of the vector VAT. The result is a vector VAT that contains, for each client, the correct VAT that needs to be applied. You can test this by adding, for example, a variable type to the data frame clients you created in the previous section like this:

You can also use `numpy.corrcoef`

to automatically take care of indexing problem:

for each in list_of_df_a: corr_e = np.corrcoef(each['Unit_Weight'], df_b['Unit_Weight'])[0,1] print(corr_e) 0.6797202605786716 -0.5501914564062937 0.2653370297540246

**A Tutorial on Loops in R - Usage and Alternatives,** A tutorial on loops in R that looks at the constructs available in R for looping. continue the next sections by gradually moving to the structures on the right. Then you transform it into a data frame (thus 10 observations of 10 variables) there are very efficient functions for calculating sums and means for� Using a for loop and an if/else statement, tally the number of animals that weigh over an ounce in our adjusted dataset. To get you started, here is code to create a data.frame where all recrods with NA for the weight are removed:

**Is there any way to store the generated dataframes within a loop with ,** I am performing a loop in R. It takes the values from a big data frame and segregates it into small segments of dataframes. But I am unable to store the For each level in 'some.factor' calculate the. # normalized value of� I met a problem of running a t-test for some data stored in a data frame. I know how to do it one by one but not efficient at all. May I ask how to write a loop to do it? For example, I have got the data in the testData:

**Strategies to Speedup R Code,** The for-loop in R, can be very slow in its raw un-optimised form, especially when For every row on this data frame (df), check if the sum of all values is greater than 4. If it is The results are not calculated for data.table() , byte code compilation and parallelisation Or in later sections we see "condition 4". The R programming language has become the de facto programming language for data science. Its flexibility, power, sophistication, and expressiveness have made it an invaluable tool for data scientists around the world. This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to

How to do arithmetic on columns of a data frame. R makes it very easy to perform calculations on columns of a data frame because each column is itself a vector. Sticking to the iris data frame, try to do a few calculations on the columns. For example, calculate the ratio between the lengths and width of the sepals:

##### Comments

- good day to you, sir! thank you for the lightening speed and marvels question answering skills!
- thank you for sharing! certainly this helps a lot of people!