Calculations from segments(looping) of a data-frame

Related searches

2 data frames. 1 short 1 long. I want to break the long one into chunks, to compare them to the short one, using correlation coefficient.

The splits are fine. However when putting them in calculation, it returns Nan.

import pandas as pd

data_a = {'ID': ["a1","a2","a3","a4","a5","a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"], 
'Unit_Weight': [178,153,193,195,214,157,205,212,219,166,217,186,170,207,204]}

df_a = pd.DataFrame(data_a)

data_b = {'ID': ["b1","b2","b3","b4","b5"], 
'Unit_Weight': [128,123,123,125,204]}

df_b = pd.DataFrame(data_b)

size = 5      # 5 rows in the long data-frame
list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].corr(df_b['Unit_Weight'])

Output:

0.6797202605786716
nan
nan

What went wrong, and how can it be corrected? Thank you.

p.s.: these are the results when manually calculated:

0.6797202605786716
-0.5501914564062937
0.2653370297540246

   ID  Unit_Weight
0  a1          178
1  a2          153
2  a3          193
3  a4          195
4  a5          214
    ID  Unit_Weight
5   a6          157
6   a7          205
7   a8          212
8   a9          219
9  a10          166
     ID  Unit_Weight
10  a11          217
11  a12          186
12  a13          170
13  a14          207
14  a15          204

There is necessary same indices in both Series, so use DataFrame.reset_index with drop=True:

for each in list_of_df_a:
    corr_e = each['Unit_Weight'].reset_index(drop=True).corr(df_b['Unit_Weight'])
    print (corr_e)

0.6797202605786716
-0.5501914564062937
0.26533702975402457

Loops and Functions in R, So far everything we have done, we've done by hand: calculate a single mean, plot a single For example, we can do something to every row of our dataframe. Varun March 10, 2019 Pandas : Loop or Iterate over all or certain columns of a dataframe 2019-03-10T19:11:21+05:30 Pandas, Python No Comment In this article we will different ways to iterate over all or certain columns of a Dataframe.

@jezrael has a very good answer, but another way would be to change:

list_of_df_a = [df_a.loc[i:i+size-1,:] for i in range(0, len(df_a),size)]

To:

list_of_df_a = [df_a.loc[i:i+size-1,:].reset_index(drop=True) for i in range(0, len(df_a),size)]

And now your results would be:

0.6797202605786716
-0.5501914564062937
0.26533702975402457

Repeating things, In R there is a whole family of looping functions, each with their own strengths. Sometimes the combine phase means making a new data frame, other " continent", get.n.countries); or as a formula: daply(data, ~continent,� In each round through the loop, add the outcome of switch() at the end of the vector VAT. The result is a vector VAT that contains, for each client, the correct VAT that needs to be applied. You can test this by adding, for example, a variable type to the data frame clients you created in the previous section like this:

You can also use numpy.corrcoef to automatically take care of indexing problem:

for each in list_of_df_a:
    corr_e = np.corrcoef(each['Unit_Weight'], df_b['Unit_Weight'])[0,1]
    print(corr_e)

0.6797202605786716
-0.5501914564062937
0.2653370297540246

A Tutorial on Loops in R - Usage and Alternatives, A tutorial on loops in R that looks at the constructs available in R for looping. continue the next sections by gradually moving to the structures on the right. Then you transform it into a data frame (thus 10 observations of 10 variables) there are very efficient functions for calculating sums and means for� Using a for loop and an if/else statement, tally the number of animals that weigh over an ounce in our adjusted dataset. To get you started, here is code to create a data.frame where all recrods with NA for the weight are removed:

Is there any way to store the generated dataframes within a loop with , I am performing a loop in R. It takes the values from a big data frame and segregates it into small segments of dataframes. But I am unable to store the For each level in 'some.factor' calculate the. # normalized value of� I met a problem of running a t-test for some data stored in a data frame. I know how to do it one by one but not efficient at all. May I ask how to write a loop to do it? For example, I have got the data in the testData:

Strategies to Speedup R Code, The for-loop in R, can be very slow in its raw un-optimised form, especially when For every row on this data frame (df), check if the sum of all values is greater than 4. If it is The results are not calculated for data.table() , byte code compilation and parallelisation Or in later sections we see "condition 4". The R programming language has become the de facto programming language for data science. Its flexibility, power, sophistication, and expressiveness have made it an invaluable tool for data scientists around the world. This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to

How to do arithmetic on columns of a data frame. R makes it very easy to perform calculations on columns of a data frame because each column is itself a vector. Sticking to the iris data frame, try to do a few calculations on the columns. For example, calculate the ratio between the lengths and width of the sepals:

Comments
  • good day to you, sir! thank you for the lightening speed and marvels question answering skills!
  • thank you for sharing! certainly this helps a lot of people!